2022-04-21 更新

Sound-Guided Semantic Video Generation

Authors:Seung Hyun Lee, Gyeongrok Oh, Wonmin Byeon, Jihyun Bae, Chanyoung Kim, Won Jeong Ryoo, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim

The recent success in StyleGAN demonstrates that pre-trained StyleGAN latent space is useful for realistic video generation. However, the generated motion in the video is usually not semantically meaningful due to the difficulty of determining the direction and magnitude in the StyleGAN latent space. In this paper, we propose a framework to generate realistic videos by leveraging multimodal (sound-image-text) embedding space. As sound provides the temporal contexts of the scene, our framework learns to generate a video that is semantically consistent with sound. First, our sound inversion module maps the audio directly into the StyleGAN latent space. We then incorporate the CLIP-based multimodal embedding space to further provide the audio-visual relationships. Finally, the proposed frame generator learns to find the trajectory in the latent space which is coherent with the corresponding sound and generates a video in a hierarchical manner. We provide the new high-resolution landscape video dataset (audio-visual pair) for the sound-guided video generation task. The experiments show that our model outperforms the state-of-the-art methods in terms of video quality. We further show several applications including image and video editing to verify the effectiveness of our method.
PDF

论文截图

DAM-GAN : Image Inpainting using Dynamic Attention Map based on Fake Texture Detection

Authors:Dongmin Cha, Daijin Kim

Deep neural advancements have recently brought remarkable image synthesis performance to the field of image inpainting. The adaptation of generative adversarial networks (GAN) in particular has accelerated significant progress in high-quality image reconstruction. However, although many notable GAN-based networks have been proposed for image inpainting, still pixel artifacts or color inconsistency occur in synthesized images during the generation process, which are usually called fake textures. To reduce pixel inconsistency disorder resulted from fake textures, we introduce a GAN-based model using dynamic attention map (DAM-GAN). Our proposed DAM-GAN concentrates on detecting fake texture and products dynamic attention maps to diminish pixel inconsistency from the feature maps in the generator. Evaluation results on CelebA-HQ and Places2 datasets with other image inpainting approaches show the superiority of our network.
PDF

论文截图

2022-04-21 更新

Efficient Subsampling of Realistic Images From GANs Conditional on a Class or a Continuous Variable

Authors:Xin Ding, Yongwei Wang, Z. Jane Wang, William J. Welch

Recently, subsampling or refining images generated from unconditional GANs has been actively studied to improve the overall image quality. Unfortunately, these methods are often observed less effective or inefficient in handling conditional GANs (cGANs) — conditioning on a class (aka class-conditional GANs) or a continuous variable (aka continuous cGANs or CcGANs). In this work, we introduce an effective and efficient subsampling scheme, named conditional density ratio-guided rejection sampling (cDR-RS), to sample high-quality images from cGANs. Specifically, we first develop a novel conditional density ratio estimation method, termed cDRE-F-cSP, by proposing the conditional Softplus (cSP) loss and an improved feature extraction mechanism. We then derive the error bound of a density ratio model trained with the cSP loss. Finally, we accept or reject a fake image in terms of its estimated conditional density ratio. A filtering scheme is also developed to increase fake images’ label consistency without losing diversity when sampling from CcGANs. We extensively test the effectiveness and efficiency of cDR-RS in sampling from both class-conditional GANs and CcGANs on five benchmark datasets. When sampling from class-conditional GANs, cDR-RS outperforms modern state-of-the-art methods by a large margin (except DRE-F-SP+RS) in terms of effectiveness. Although the effectiveness of cDR-RS is often comparable to that of DRE-F-SP+RS, cDR-RS is substantially more efficient. When sampling from CcGANs, the superiority of cDR-RS is even more noticeable in terms of both effectiveness and efficiency. Notably, with the consumption of reasonable computational resources, cDR-RS can substantially reduce Label Score without decreasing the diversity of CcGAN-generated images, while other methods often need to trade much diversity for slightly improved Label Score.
PDF

论文截图

木子已

https://ipaper.today/2022/04/21/2022-04-21-gan/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源木子已 !

GAN

检测/分割/跟踪

2022-04-21 检测/分割/跟踪

检测分割跟踪

无监督/半监督/对比学习

2022-04-21 无监督/半监督/对比学习

无监督半监督对比学习

GAN

2022-04-21 更新

Sound-Guided Semantic Video Generation

DAM-GAN : Image Inpainting using Dynamic Attention Map based on Fake Texture Detection

2022-04-21 更新

Efficient Subsampling of Realistic Images From GANs Conditional on a Class or a Continuous Variable

打赏用于支持本站流量费