I2I Translation


2023-05-03 更新

Image-based Indian Sign Language Recognition: A Practical Review using Deep Neural Networks

Authors:Mallikharjuna Rao K, Harleen Kaur, Sanjam Kaur Bedi, M A Lekhana

People with vocal and hearing disabilities use sign language to express themselves using visual gestures and signs. Although sign language is a solution for communication difficulties faced by deaf people, there are still problems as most of the general population cannot understand this language, creating a communication barrier, especially in places such as banks, airports, supermarkets, etc. [1]. A sign language recognition(SLR) system is a must to solve this problem. The main focus of this model is to develop a real-time word-level sign language recognition system that would translate sign language to text. Much research has been done on ASL(American sign language). Thus, we have worked on ISL(Indian sign language) to cater to the needs of the deaf and hard-of-hearing community of India[2]. In this research, we provide an Indian Sign Language-based Sign Language recognition system. For this analysis, the user must be able to take pictures of hand movements using a web camera, and the system must anticipate and display the name of the taken picture. The acquired image goes through several processing phases, some of which use computer vision techniques, including grayscale conversion, dilatation, and masking. Our model is trained using a convolutional neural network (CNN), which is then utilized to recognize the images. Our best model has a 99% accuracy rate[3].
PDF 14 pages, 20 figures, 1 table

点此查看论文截图

StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video

Authors:Lizhen Wang, Xiaochen Zhao, Jingxiang Sun, Yuxiang Zhang, Hongwen Zhang, Tao Yu, Yebin Liu

Face reenactment methods attempt to restore and re-animate portrait videos as realistically as possible. Existing methods face a dilemma in quality versus controllability: 2D GAN-based methods achieve higher image quality but suffer in fine-grained control of facial attributes compared with 3D counterparts. In this work, we propose StyleAvatar, a real-time photo-realistic portrait avatar reconstruction method using StyleGAN-based networks, which can generate high-fidelity portrait avatars with faithful expression control. We expand the capabilities of StyleGAN by introducing a compositional representation and a sliding window augmentation method, which enable faster convergence and improve translation generalization. Specifically, we divide the portrait scenes into three parts for adaptive adjustments: facial region, non-facial foreground region, and the background. Besides, our network leverages the best of UNet, StyleGAN and time coding for video learning, which enables high-quality video generation. Furthermore, a sliding window augmentation method together with a pre-training strategy are proposed to improve translation generalization and training performance, respectively. The proposed network can converge within two hours while ensuring high image quality and a forward rendering time of only 20 milliseconds. Furthermore, we propose a real-time live system, which further pushes research into applications. Results and experiments demonstrate the superiority of our method in terms of image quality, full portrait video generation, and real-time re-animation compared to existing facial reenactment methods. Training and inference code for this paper are at https://github.com/LizhenWangT/StyleAvatar.
PDF 8 pages, 5 figures, SIGGRAPH 2023 Conference Proceedings

点此查看论文截图

EgoLocate: Real-time Motion Capture, Localization, and Mapping with Sparse Body-mounted Sensors

Authors:Xinyu Yi, Yuxiao Zhou, Marc Habermann, Vladislav Golyanik, Shaohua Pan, Christian Theobalt, Feng Xu

Human and environment sensing are two important topics in Computer Vision and Graphics. Human motion is often captured by inertial sensors, while the environment is mostly reconstructed using cameras. We integrate the two techniques together in EgoLocate, a system that simultaneously performs human motion capture (mocap), localization, and mapping in real time from sparse body-mounted sensors, including 6 inertial measurement units (IMUs) and a monocular phone camera. On one hand, inertial mocap suffers from large translation drift due to the lack of the global positioning signal. EgoLocate leverages image-based simultaneous localization and mapping (SLAM) techniques to locate the human in the reconstructed scene. On the other hand, SLAM often fails when the visual feature is poor. EgoLocate involves inertial mocap to provide a strong prior for the camera motion. Experiments show that localization, a key challenge for both two fields, is largely improved by our technique, compared with the state of the art of the two fields. Our codes are available for research at https://xinyu-yi.github.io/EgoLocate/.
PDF Accepted by SIGGRAPH 2023. Project page: https://xinyu-yi.github.io/EgoLocate/

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录