2022-03-31 更新
StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation
Authors:Roy Or-El, Xuan Luo, Mengyi Shan, Eli Shechtman, Jeong Joon Park, Ira Kemelmacher-Shlizerman
We introduce a high resolution, 3D-consistent image and shape generation technique which we call StyleSDF. Our method is trained on single-view RGB data only, and stands on the shoulders of StyleGAN2 for image generation, while solving two main challenges in 3D-aware GANs: 1) high-resolution, view-consistent generation of the RGB images, and 2) detailed 3D shape. We achieve this by merging a SDF-based 3D representation with a style-based 2D generator. Our 3D implicit network renders low-resolution feature maps, from which the style-based network generates view-consistent, 1024x1024 images. Notably, our SDF-based 3D modeling defines detailed 3D surfaces, leading to consistent volume rendering. Our method shows higher quality results compared to state of the art in terms of visual and geometric quality.
PDF Camera-Ready version. Paper was accepted as oral to CVPR 2022. Added discussions and figures from the rebuttal to the supplementary material (sections C & F). Project Webpage: https://stylesdf.github.io/
论文截图
CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs
Authors:Jiteng Mu, Shalini De Mello, Zhiding Yu, Nuno Vasconcelos, Xiaolong Wang, Jan Kautz, Sifei Liu
Recent advances show that Generative Adversarial Networks (GANs) can synthesize images with smooth variations along semantically meaningful latent directions, such as pose, expression, layout, etc. While this indicates that GANs implicitly learn pixel-level correspondences across images, few studies explored how to extract them explicitly. In this work, we introduce Coordinate GAN (CoordGAN), a structure-texture disentangled GAN that learns a dense correspondence map for each generated image. We represent the correspondence maps of different images as warped coordinate frames transformed from a canonical coordinate frame, i.e., the correspondence map, which describes the structure (e.g., the shape of a face), is controlled via a transformation. Hence, finding correspondences boils down to locating the same coordinate in different correspondence maps. In CoordGAN, we sample a transformation to represent the structure of a synthesized instance, while an independent texture branch is responsible for rendering appearance details orthogonal to the structure. Our approach can also extract dense correspondence maps for real images by adding an encoder on top of the generator. We quantitatively demonstrate the quality of the learned dense correspondences through segmentation mask transfer on multiple datasets. We also show that the proposed generator achieves better structure and texture disentanglement compared to existing approaches. Project page: https://jitengmu.github.io/CoordGAN/
PDF Project page: https://jitengmu.github.io/CoordGAN/
论文截图
Disentangled3D: Learning a 3D Generative Model with Disentangled Geometry and Appearance from Monocular Images
Authors:Ayush Tewari, Mallikarjun B R, Xingang Pan, Ohad Fried, Maneesh Agrawala, Christian Theobalt
Learning 3D generative models from a dataset of monocular images enables self-supervised 3D reasoning and controllable synthesis. State-of-the-art 3D generative models are GANs which use neural 3D volumetric representations for synthesis. Images are synthesized by rendering the volumes from a given camera. These models can disentangle the 3D scene from the camera viewpoint in any generated image. However, most models do not disentangle other factors of image formation, such as geometry and appearance. In this paper, we design a 3D GAN which can learn a disentangled model of objects, just from monocular observations. Our model can disentangle the geometry and appearance variations in the scene, i.e., we can independently sample from the geometry and appearance spaces of the generative model. This is achieved using a novel non-rigid deformable scene formulation. A 3D volume which represents an object instance is computed as a non-rigidly deformed canonical 3D volume. Our method learns the canonical volume, as well as its deformations, jointly during training. This formulation also helps us improve the disentanglement between the 3D scene and the camera viewpoints using a novel pose regularization loss defined on the 3D deformation field. In addition, we further model the inverse deformations, enabling the computation of dense correspondences between images generated by our model. Finally, we design an approach to embed real images into the latent space of our disentangled generative model, enabling editing of real images.
PDF CVPR 2022
论文截图
High-resolution Face Swapping via Latent Semantics Disentanglement
Authors:Yangyang Xu, Bailin Deng, Junle Wang, Yanqing Jing, Jia Pan, Shengfeng He
We present a novel high-resolution face swapping method using the inherent prior knowledge of a pre-trained GAN model. Although previous research can leverage generative priors to produce high-resolution results, their quality can suffer from the entangled semantics of the latent space. We explicitly disentangle the latent semantics by utilizing the progressive nature of the generator, deriving structure attributes from the shallow layers and appearance attributes from the deeper ones. Identity and pose information within the structure attributes are further separated by introducing a landmark-driven structure transfer latent direction. The disentangled latent code produces rich generative features that incorporate feature blending to produce a plausible swapping result. We further extend our method to video face swapping by enforcing two spatio-temporal constraints on the latent space and the image space. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art image/video face swapping methods in terms of hallucination quality and consistency. Code can be found at: https://github.com/cnnlstm/FSLSD_HiRes.
PDF Paper is Acctpted by CVPR2022
论文截图
TransductGAN: a Transductive Adversarial Model for Novelty Detection
Authors:Najiba Toron, Janaina Mourao-Miranda, John Shawe-Taylor
Novelty detection, a widely studied problem in machine learning, is the problem of detecting a novel class of data that has not been previously observed. A common setting for novelty detection is inductive whereby only examples of the negative class are available during training time. Transductive novelty detection on the other hand has only witnessed a recent surge in interest, it not only makes use of the negative class during training but also incorporates the (unlabeled) test set to detect novel examples. Several studies have emerged under the transductive setting umbrella that have demonstrated its advantage over its inductive counterpart. Depending on the assumptions about the data, these methods go by different names (e.g. transductive novelty detection, semi-supervised novelty detection, positive-unlabeled learning, out-of-distribution detection). With the use of generative adversarial networks (GAN), a segment of those studies have adopted a transductive setup in order to learn how to generate examples of the novel class. In this study, we propose TransductGAN, a transductive generative adversarial network that attempts to learn how to generate image examples from both the novel and negative classes by using a mixture of two Gaussians in the latent space. It achieves that by incorporating an adversarial autoencoder with a GAN network, the ability to generate examples of novel data points offers not only a visual representation of novelties, but also overcomes the hurdle faced by many inductive methods of how to tune the model hyperparameters at the decision rule level. Our model has shown superior performance over state-of-the-art inductive and transductive methods. Our study is fully reproducible with the code available publicly.
PDF