GAN


2022-08-20 更新

Style Your Hair: Latent Optimization for Pose-Invariant Hairstyle Transfer via Local-Style-Aware Hair Alignment

Authors:Taewoo Kim, Chaeyeon Chung, Yoonseo Kim, Sunghyun Park, Kangyeol Kim, Jaegul Choo

Editing hairstyle is unique and challenging due to the complexity and delicacy of hairstyle. Although recent approaches significantly improved the hair details, these models often produce undesirable outputs when a pose of a source image is considerably different from that of a target hair image, limiting their real-world applications. HairFIT, a pose-invariant hairstyle transfer model, alleviates this limitation yet still shows unsatisfactory quality in preserving delicate hair textures. To solve these limitations, we propose a high-performing pose-invariant hairstyle transfer model equipped with latent optimization and a newly presented local-style-matching loss. In the StyleGAN2 latent space, we first explore a pose-aligned latent code of a target hair with the detailed textures preserved based on local style matching. Then, our model inpaints the occlusions of the source considering the aligned target hair and blends both images to produce a final output. The experimental results demonstrate that our model has strengths in transferring a hairstyle under larger pose differences and preserving local hairstyle textures.
PDF Accepted to ECCV 2022

点此查看论文截图

Mind the Gap in Distilling StyleGANs

Authors:Guodong Xu, Yuenan Hou, Ziwei Liu, Chen Change Loy

StyleGAN family is one of the most popular Generative Adversarial Networks (GANs) for unconditional generation. Despite its impressive performance, its high demand on storage and computation impedes their deployment on resource-constrained devices. This paper provides a comprehensive study of distilling from the popular StyleGAN-like architecture. Our key insight is that the main challenge of StyleGAN distillation lies in the output discrepancy issue, where the teacher and student model yield different outputs given the same input latent code. Standard knowledge distillation losses typically fail under this heterogeneous distillation scenario. We conduct thorough analysis about the reasons and effects of this discrepancy issue, and identify that the mapping network plays a vital role in determining semantic information of generated images. Based on this finding, we propose a novel initialization strategy for the student model, which can ensure the output consistency to the maximum extent. To further enhance the semantic consistency between the teacher and student model, we present a latent-direction-based distillation loss that preserves the semantic relations in latent space. Extensive experiments demonstrate the effectiveness of our approach in distilling StyleGAN2 and StyleGAN3, outperforming existing GAN distillation methods by a large margin.
PDF Accepted by ECCV2022

点此查看论文截图

Text to Image Synthesis using Stacked Conditional Variational Autoencoders and Conditional Generative Adversarial Networks

Authors:Haileleol Tibebu, Aadil Malik, Varuna De Silva

Synthesizing a realistic image from textual description is a major challenge in computer vision. Current text to image synthesis approaches falls short of producing a highresolution image that represent a text descriptor. Most existing studies rely either on Generative Adversarial Networks (GANs) or Variational Auto Encoders (VAEs). GANs has the capability to produce sharper images but lacks the diversity of outputs, whereas VAEs are good at producing a diverse range of outputs, but the images generated are often blurred. Taking into account the relative advantages of both GANs and VAEs, we proposed a new stacked Conditional VAE (CVAE) and Conditional GAN (CGAN) network architecture for synthesizing images conditioned on a text description. This study uses Conditional VAEs as an initial generator to produce a high-level sketch of the text descriptor. This high-level sketch output from first stage and a text descriptor is used as an input to the conditional GAN network. The second stage GAN produces a 256x256 high resolution image. The proposed architecture benefits from a conditioning augmentation and a residual block on the Conditional GAN network to achieve the results. Multiple experiments were conducted using CUB and Oxford-102 dataset and the result of the proposed approach is compared against state-ofthe-art techniques such as StackGAN. The experiments illustrate that the proposed method generates a high-resolution image conditioned on text descriptions and yield competitive results based on Inception and Frechet Inception Score using both datasets
PDF

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录