元宇宙/虚拟人


2022-12-01 更新

High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors

Authors:Yunpeng Bai, Yanbo Fan, Xuan Wang, Yong Zhang, Jingxiang Sun, Chun Yuan, Ying Shan

High-fidelity facial avatar reconstruction from a monocular video is a significant research problem in computer graphics and computer vision. Recently, Neural Radiance Field (NeRF) has shown impressive novel view rendering results and has been considered for facial avatar reconstruction. However, the complex facial dynamics and missing 3D information in monocular videos raise significant challenges for faithful facial reconstruction. In this work, we propose a new method for NeRF-based facial avatar reconstruction that utilizes 3D-aware generative prior. Different from existing works that depend on a conditional deformation field for dynamic modeling, we propose to learn a personalized generative prior, which is formulated as a local and low dimensional subspace in the latent space of 3D-GAN. We propose an efficient method to construct the personalized generative prior based on a small set of facial images of a given individual. After learning, it allows for photo-realistic rendering with novel views and the face reenactment can be realized by performing navigation in the latent space. Our proposed method is applicable for different driven signals, including RGB images, 3DMM coefficients, and audios. Compared with existing works, we obtain superior novel view synthesis results and faithfully face reenactment performance.
PDF 8 pages, 7 figures

点此查看论文截图

Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar

Authors:Aolan Sun, Xulong Zhang, Tiandong Ling, Jianzong Wang, Ning Cheng, Jing Xiao

Since the beginning of the COVID-19 pandemic, remote conferencing and school-teaching have become important tools. The previous applications aim to save the commuting cost with real-time interactions. However, our application is going to lower the production and reproduction costs when preparing the communication materials. This paper proposes a system called Pre-Avatar, generating a presentation video with a talking face of a target speaker with 1 front-face photo and a 3-minute voice recording. Technically, the system consists of three main modules, user experience interface (UEI), talking face module and few-shot text-to-speech (TTS) module. The system firstly clones the target speaker’s voice, and then generates the speech, and finally generate an avatar with appropriate lip and head movements. Under any scenario, users only need to replace slides with different notes to generate another new video. The demo has been released here and will be published as free software for use.
PDF Accepted by ICTAI2022. The 34th IEEE International Conference on Tools with Artificial Intelligence (ICTAI)

点此查看论文截图

Motion Projection Consistency Based 3D Human Pose Estimation with Virtual Bones from Monocular Videos

Authors:Guangming Wang, Honghao Zeng, Ziliang Wang, Zhe Liu, Hesheng Wang

Real-time 3D human pose estimation is crucial for human-computer interaction. It is cheap and practical to estimate 3D human pose only from monocular video. However, recent bone splicing based 3D human pose estimation method brings about the problem of cumulative error. In this paper, the concept of virtual bones is proposed to solve such a challenge. The virtual bones are imaginary bones between non-adjacent joints. They do not exist in reality, but they bring new loop constraints for the estimation of 3D human joints. The proposed network in this paper predicts real bones and virtual bones, simultaneously. The final length of real bones is constrained and learned by the loop constructed by the predicted real bones and virtual bones. Besides, the motion constraints of joints in consecutive frames are considered. The consistency between the 2D projected position displacement predicted by the network and the captured real 2D displacement by the camera is proposed as a new projection consistency loss for the learning of 3D human pose. The experiments on the Human3.6M dataset demonstrate the good performance of the proposed method. Ablation studies demonstrate the effectiveness of the proposed inter-frame projection consistency constraints and intra-frame loop constraints.
PDF 10 pages, 7 figures. Accepted by TCDS 2022

点此查看论文截图

Photo-realistic 360 Head Avatars in the Wild

Authors:Stanislaw Szymanowicz, Virginia Estellers, Tadas Baltrusaitis, Matthew Johnson

Delivering immersive, 3D experiences for human communication requires a method to obtain 360 degree photo-realistic avatars of humans. To make these experiences accessible to all, only commodity hardware, like mobile phone cameras, should be necessary to capture the data needed for avatar creation. For avatars to be rendered realistically from any viewpoint, we require training images and camera poses from all angles. However, we cannot rely on there being trackable features in the foreground or background of all images for use in estimating poses, especially from the side or back of the head. To overcome this, we propose a novel landmark detector trained on synthetic data to estimate camera poses from 360 degree mobile phone videos of a human head for use in a multi-stage optimization process which creates a photo-realistic avatar. We perform validation experiments with synthetic data and showcase our method on 360 degree avatars trained from mobile phone videos.
PDF ECCV 2022 Workshop on Computer Vision for Metaverse

点此查看论文截图

Efficient Customer Service Combining Human Operators and Virtual Agents

Authors:Yaniv Oshrat, Yonatan Aumann, Tal Hollander, Oleg Maksimov, Anita Ostroumov, Natali Shechtman, Sarit Kraus

The prospect of combining human operators and virtual agents (bots) into an effective hybrid system that provides proper customer service to clients is promising yet challenging. The hybrid system decreases the customers’ frustration when bots are unable to provide appropriate service and increases their satisfaction when they prefer to interact with human operators. Furthermore, we show that it is possible to decrease the cost and efforts of building and maintaining such virtual agents by enabling the virtual agent to incrementally learn from the human operators. We employ queuing theory to identify the key parameters that govern the behavior and efficiency of such hybrid systems and determine the main parameters that should be optimized in order to improve the service. We formally prove, and demonstrate in extensive simulations and in a user study, that with the proper choice of parameters, such hybrid systems are able to increase the number of served clients while simultaneously decreasing their expected waiting time and increasing satisfaction.
PDF

点此查看论文截图

Realistic, Animatable Human Reconstructions for Virtual Fit-On

Authors:Gayal Kuruppu, Bumuthu Dilshan, Shehan Samarasinghe, Nipuna Madhushan, Ranga Rodrigo

We present an end-to-end virtual try-on pipeline, that can fit different clothes on a personalized 3-D human model, reconstructed using a single RGB image. Our main idea is to construct an animatable 3-D human model and try-on different clothes in a 3-D virtual environment. The existing frame by frame volumetric reconstruction of 3-D human models are highly resource-demanding and do not allow clothes switching. Moreover, existing virtual fit-on systems also lack realism due to predominantly being 2-D or not using user’s features in the reconstruction. These shortcomings are due to either the human body or clothing model being 2-D or not having the user’s facial features in the dressed model. We solve these problems by manipulating a parametric representation of the 3-D human body model and stitching a head model reconstructed from the actual image. Fitting the 3-D clothing models on the parameterized human model is also adjustable to the body shape of the input image. Our reconstruction results, in comparison with recent existing work, are more visually-pleasing.
PDF

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录