GAN


2022-06-28 更新

RePaint: Inpainting using Denoising Diffusion Probabilistic Models

Authors:Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, Luc Van Gool

Free-form inpainting is the task of adding new content to an image in the regions specified by an arbitrary binary mask. Most existing approaches train for a certain distribution of masks, which limits their generalization capabilities to unseen mask types. Furthermore, training with pixel-wise and perceptual losses often leads to simple textural extensions towards the missing areas instead of semantically meaningful generation. In this work, we propose RePaint: A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks. We employ a pretrained unconditional DDPM as the generative prior. To condition the generation process, we only alter the reverse diffusion iterations by sampling the unmasked regions using the given image information. Since this technique does not modify or condition the original DDPM network itself, the model produces high-quality and diverse output images for any inpainting form. We validate our method for both faces and general-purpose image inpainting using standard and extreme masks. RePaint outperforms state-of-the-art Autoregressive, and GAN approaches for at least five out of six mask distributions. Github Repository: git.io/RePaint
PDF We missed out on other diffusion models that work on inpainting. We corrected that and apologize for this mistake

点此查看论文截图

Video2StyleGAN: Encoding Video in Latent Space for Manipulation

Authors:Jiyang Yu, Jingen Liu, Jing Huang, Wei Zhang, Tao Mei

Many recent works have been proposed for face image editing by leveraging the latent space of pretrained GANs. However, few attempts have been made to directly apply them to videos, because 1) they do not guarantee temporal consistency, 2) their application is limited by their processing speed on videos, and 3) they cannot accurately encode details of face motion and expression. To this end, we propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation. Based on the vision transformer, our network reuses the high-resolution portion of the latent vector to enforce temporal consistency. To capture subtle face motions and expressions, we design novel losses that involve sparse facial landmarks and dense 3D face mesh. We have thoroughly evaluated our approach and successfully demonstrated its application to various face video manipulations. Particularly, we propose a novel network for pose/expression control in a 3D coordinate system. Both qualitative and quantitative results have shown that our approach can significantly outperform existing single image methods, while achieving real-time (66 fps) speed.
PDF

点此查看论文截图

2022-06-28 更新

Enhancing Quality of Pose-varied Face Restoration with Local Weak Feature Sensing and GAN Prior

Authors:Kai Hu, Yu Liu, Renhe Liu, Wei Lu, Gang Yu, Bin Fu

Facial semantic guidance (including facial landmarks, facial heatmaps, and facial parsing maps) and facial generative adversarial networks (GAN) prior have been widely used in blind face restoration (BFR) in recent years. Although existing BFR methods have achieved good performance in ordinary cases, these solutions have limited resilience when applied to face images with serious degradation and pose-varied (e.g., looking right, looking left, laughing, etc.) in real-world scenarios. In this work, we propose a well-designed blind face restoration network with generative facial prior. The proposed network is mainly comprised of an asymmetric codec and a StyleGAN2 prior network. In the asymmetric codec, we adopt a mixed multi-path residual block (MMRB) to gradually extract weak texture features of input images, which can better preserve the original facial features and avoid excessive fantasy. The MMRB can also be plug-and-play in other networks. Furthermore, thanks to the affluent and diverse facial priors of the StyleGAN2 model, we adopt a fine-tuned approach to flexibly restore natural and realistic facial details. Besides, a novel self-supervised training strategy is specially designed for face restoration tasks to fit the distribution closer to the target and maintain training stability. Extensive experiments on both synthetic and real-world datasets demonstrate that our model achieves superior performance to the prior art for face restoration and face super-resolution tasks.
PDF pdfLaTeX 2021, 11 pages with 15 figures

点此查看论文截图

MULTI-FLGANs: Multi-Distributed Adversarial Networks for Non-IID distribution

Authors:Akash Amalan, Rui Wang, Yanqi Qiao, Emmanouil Panaousis, Kaitai Liang

Federated learning is an emerging concept in the domain of distributed machine learning. This concept has enabled GANs to benefit from the rich distributed training data while preserving privacy. However, in a non-iid setting, current federated GAN architectures are unstable, struggling to learn the distinct features and vulnerable to mode collapse. In this paper, we propose a novel architecture MULTI-FLGAN to solve the problem of low-quality images, mode collapse and instability for non-iid datasets. Our results show that MULTI-FLGAN is four times as stable and performant (i.e. high inception score) on average over 20 clients compared to baseline FLGAN.
PDF

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录