GAN


2022-05-04 更新

A Bidirectional Conversion Network for Cross-Spectral Face Recognition

Authors:Zhicheng Cao, Jiaxuan Zhang, Liaojun Pang

Face recognition in the infrared (IR) band has become an important supplement to visible light face recognition due to its advantages of independent background light, strong penetration, ability of imaging under harsh environments such as nighttime, rain and fog. However, cross-spectral face recognition (i.e., VIS to IR) is very challenging due to the dramatic difference between the visible light and IR imageries as well as the lack of paired training data. This paper proposes a framework of bidirectional cross-spectral conversion (BCSC-GAN) between the heterogeneous face images, and designs an adaptive weighted fusion mechanism based on information fusion theory. The network reduces the cross-spectral recognition problem into an intra-spectral problem, and improves performance by fusing bidirectional information. Specifically, a face identity retaining module (IRM) is introduced with the ability to preserve identity features, and a new composite loss function is designed to overcome the modal differences caused by different spectral characteristics. Two datasets of TINDERS and CASIA were tested, where performance metrics of FID, recognition rate, equal error rate and normalized distance were compared. Results show that our proposed network is superior than other state-of-the-art methods. Additionally, the proposed rule of Self Adaptive Weighted Fusion (SAWF) is better than the recognition results of the unfused case and other traditional fusion rules that are commonly used, which further justifies the effectiveness and superiority of the proposed bidirectional conversion approach.
PDF

论文截图

Copy Motion From One to Another: Fake Motion Video Generation

Authors:Zhenguang Liu, Sifan Wu, Chejian Xu, Xiang Wang, Lei Zhu, Shuang Wu, Fuli Feng

One compelling application of artificial intelligence is to generate a video of a target person performing arbitrary desired motion (from a source person). While the state-of-the-art methods are able to synthesize a video demonstrating similar broad stroke motion details, they are generally lacking in texture details. A pertinent manifestation appears as distorted face, feet, and hands, and such flaws are very sensitively perceived by human observers. Furthermore, current methods typically employ GANs with a L2 loss to assess the authenticity of the generated videos, inherently requiring a large amount of training samples to learn the texture details for adequate video generation. In this work, we tackle these challenges from three aspects: 1) We disentangle each video frame into foreground (the person) and background, focusing on generating the foreground to reduce the underlying dimension of the network output. 2) We propose a theoretically motivated Gromov-Wasserstein loss that facilitates learning the mapping from a pose to a foreground image. 3) To enhance texture details, we encode facial features with geometric guidance and employ local GANs to refine the face, feet, and hands. Extensive experiments show that our method is able to generate realistic target person videos, faithfully copying complex motions from a source person. Our code and datasets are released at https://github.com/Sifann/FakeMotion
PDF This paper is accepted to IJCAI2022 (ORAL presentation)

论文截图

End-to-End Visual Editing with a Generatively Pre-Trained Artist

Authors:Andrew Brown, Cheng-Yang Fu, Omkar Parkhi, Tamara L. Berg, Andrea Vedaldi

We consider the targeted image editing problem: blending a region in a source image with a driver image that specifies the desired change. Differently from prior works, we solve this problem by learning a conditional probability distribution of the edits, end-to-end. Training such a model requires addressing a fundamental technical challenge: the lack of example edits for training. To this end, we propose a self-supervised approach that simulates edits by augmenting off-the-shelf images in a target domain. The benefits are remarkable: implemented as a state-of-the-art auto-regressive transformer, our approach is simple, sidesteps difficulties with previous methods based on GAN-like priors, obtains significantly better edits, and is efficient. Furthermore, we show that different blending effects can be learned by an intuitive control of the augmentation process, with no other changes required to the model architecture. We demonstrate the superiority of this approach across several datasets in extensive quantitative and qualitative experiments, including human studies, significantly outperforming prior work.
PDF

论文截图

BiOcularGAN: Bimodal Synthesis and Annotation of Ocular Images

Authors:Darian Tomašević, Peter Peer, Vitomir Štruc

Current state-of-the-art segmentation techniques for ocular images are critically dependent on large-scale annotated datasets, which are labor-intensive to gather and often raise privacy concerns. In this paper, we present a novel framework, called BiOcularGAN, capable of generating synthetic large-scale datasets of photorealistic (visible light and near infrared) ocular images, together with corresponding segmentation labels to address these issues. At its core, the framework relies on a novel Dual-Branch StyleGAN2 (DB-StyleGAN2) model that facilitates bimodal image generation, and a Semantic Mask Generator (SMG) that produces semantic annotations by exploiting DB-StyleGAN2’s feature space. We evaluate BiOcularGAN through extensive experiments across five diverse ocular datasets and analyze the effects of bimodal data generation on image quality and the produced annotations. Our experimental results show that BiOcularGAN is able to produce high-quality matching bimodal images and annotations (with minimal manual intervention) that can be used to train highly competitive (deep) segmentation models that perform well across multiple real-world datasets. The source code will be made publicly available.
PDF 13 pages, 13 figures

论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录