GAN


2022-08-03 更新

3D Cartoon Face Generation with Controllable Expressions from a Single GAN Image

Authors:Hao Wang, Guosheng Lin, Steven C. H. Hoi, Chunyan Miao

In this paper, we investigate an open research task of generating 3D cartoon face shapes from single 2D GAN generated human faces and without 3D supervision, where we can also manipulate the facial expressions of the 3D shapes. To this end, we discover the semantic meanings of StyleGAN latent space, such that we are able to produce face images of various expressions, poses, and lighting by controlling the latent codes. Specifically, we first finetune the pretrained StyleGAN face model on the cartoon datasets. By feeding the same latent codes to face and cartoon generation models, we aim to realize the translation from 2D human face images to cartoon styled avatars. We then discover semantic directions of the GAN latent space, in an attempt to change the facial expressions while preserving the original identity. As we do not have any 3D annotations for cartoon faces, we manipulate the latent codes to generate images with different poses and lighting, such that we can reconstruct the 3D cartoon face shapes. We validate the efficacy of our method on three cartoon datasets qualitatively and quantitatively.
PDF

点此查看论文截图

Learning to Incorporate Texture Saliency Adaptive Attention to Image Cartoonization

Authors:Xiang Gao, Yuqi Zhang, Yingjie Tian

Image cartoonization is recently dominated by generative adversarial networks (GANs) from the perspective of unsupervised image-to-image translation, in which an inherent challenge is to precisely capture and sufficiently transfer characteristic cartoon styles (e.g., clear edges, smooth color shading, abstract fine structures, etc.). Existing advanced models try to enhance cartoonization effect by learning to promote edges adversarially, introducing style transfer loss, or learning to align style from multiple representation space. This paper demonstrates that more distinct and vivid cartoonization effect could be easily achieved with only basic adversarial loss. Observing that cartoon style is more evident in cartoon-texture-salient local image regions, we build a region-level adversarial learning branch in parallel with the normal image-level one, which constrains adversarial learning on cartoon-texture-salient local patches for better perceiving and transferring cartoon texture features. To this end, a novel cartoon-texture-saliency-sampler (CTSS) module is proposed to dynamically sample cartoon-texture-salient patches from training data. With extensive experiments, we demonstrate that texture saliency adaptive attention in adversarial learning, as a missing ingredient of related methods in image cartoonization, is of significant importance in facilitating and enhancing image cartoon stylization, especially for high-resolution input pictures.
PDF Proceedings of the 39th International Conference on Machine Learning, PMLR 162:7183-7207, 2022

点此查看论文截图

Explicit Use of Fourier Spectrum in Generative Adversarial Networks

Authors:Soroush Sheikh Gargar

Generative Adversarial Networks have got the researchers’ attention due to their state-of-the-art performance in generating new images with only a dataset of the target distribution. It has been shown that there is a dissimilarity between the spectrum of authentic images and fake ones. Since the Fourier transform is a bijective mapping, saying that the model has a significant problem in learning the original distribution is a fair conclusion. In this work, we investigate the possible reasons for the mentioned drawback in the architecture and mathematical theory of the current GANs. Then we propose a new model to reduce the discrepancies between the spectrum of the actual and fake images. To that end, we design a brand new architecture for the frequency domain using the blueprint of geometric deep learning. Then, we experimentally show promising improvements in the quality of the generated images by considering the Fourier domain representation of the original data as a principal feature in the training process.
PDF Masters thesis

点此查看论文截图

Diffusion-Based Representation Learning

Authors:Korbinian Abstreiter, Sarthak Mittal, Stefan Bauer, Bernhard Schölkopf, Arash Mehrjou

Diffusion-based methods represented as stochastic differential equations on a continuous-time domain have recently proven successful as a non-adversarial generative model. Training such models relies on denoising score matching, which can be seen as multi-scale denoising autoencoders. Here, we augment the denoising score matching framework to enable representation learning without any supervised signal. GANs and VAEs learn representations by directly transforming latent codes to data samples. In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective and thus encodes the information needed for denoising. We illustrate how this difference allows for manual control of the level of details encoded in the representation. Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements of state-of-the-art models on semi-supervised image classification. We also compare the quality of learned representations of diffusion score matching with other methods like autoencoder and contrastively trained systems through their performances on downstream tasks.
PDF

点此查看论文截图

Identity-guided Face Generation with Multi-modal Contour Conditions

Authors:Qingyan Bai, Weihao Xia, Fei Yin, Yujiu Yang

Recent face generation methods have tried to synthesize faces based on the given contour condition, like a low-resolution image or sketch. However, the problem of identity ambiguity remains unsolved, which usually occurs when the contour is too vague to provide reliable identity information (e.g., when its resolution is extremely low). Thus feasible solutions of image restoration could be infinite. In this work, we propose a novel framework that takes the contour and an extra image specifying the identity as the inputs, where the contour can be of various modalities, including the low-resolution image, sketch, and semantic label map. Concretely, we propose a novel dual-encoder architecture, in which an identity encoder extracts the identity-related feature, accompanied by a main encoder to obtain the rough contour information and further fuse all the information together. The encoder output is iteratively fed into a pre-trained StyleGAN generator until getting a satisfying result. To the best of our knowledge, this is the first work that achieves identity-guided face generation conditioned on multi-modal contour images. Moreover, our method can produce photo-realistic results with 1024$\times$1024 resolution.
PDF Accepted to ICIP 2022

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录