GAN


2022-10-05 更新

Improving Sample Quality of Diffusion Models Using Self-Attention Guidance

Authors:Susung Hong, Gyuseong Lee, Wooseok Jang, Seungryong Kim

Following generative adversarial networks (GANs), a de facto standard model for image generation, denoising diffusion models (DDMs) have been actively researched and attracted strong attention due to their capability to generate images with high quality and diversity. However, the way the internal self-attention mechanism works inside the UNet of DDMs is under-explored. To unveil them, in this paper, we first investigate the self-attention operations within the black-boxed diffusion models and build hypotheses. Next, we verify the hypotheses about the self-attention map by conducting frequency analysis and testing the relationships with the generated objects. In consequence, we find out that the attention map is closely related to the quality of generated images. On the other hand, diffusion guidance methods based on additional information such as labels are proposed to improve the quality of generated images. Inspired by these methods, we present label-free guidance based on the intermediate self-attention map that can guide existing pretrained diffusion models to generate images with higher fidelity. In addition to the enhanced sample quality when used alone, we show that the results are further improved by combining our method with classifier guidance on ImageNet 128x128.
PDF Project Page: https://ku-cvlab.github.io/Self-Attention-Guidance

点此查看论文截图

Evaluation of Pre-Trained CNN Models for Geographic Fake Image Detection

Authors:Sid Ahmed Fezza, Mohammed Yasser Ouis, Bachir Kaddar, Wassim Hamidouche, Abdenour Hadid

Thanks to the remarkable advances in generative adversarial networks (GANs), it is becoming increasingly easy to generate/manipulate images. The existing works have mainly focused on deepfake in face images and videos. However, we are currently witnessing the emergence of fake satellite images, which can be misleading or even threatening to national security. Consequently, there is an urgent need to develop detection methods capable of distinguishing between real and fake satellite images. To advance the field, in this paper, we explore the suitability of several convolutional neural network (CNN) architectures for fake satellite image detection. Specifically, we benchmark four CNN models by conducting extensive experiments to evaluate their performance and robustness against various image distortions. This work allows the establishment of new baselines and may be useful for the development of CNN-based methods for fake satellite image detection.
PDF IEEE International Workshop on Multimedia Signal Processing (MMSP’2022)

点此查看论文截图

VToonify: Controllable High-Resolution Portrait Video Style Transfer

Authors:Shuai Yang, Liming Jiang, Ziwei Liu, Chen Change Loy

Generating high-quality artistic portrait videos is an important and desirable task in computer graphics and vision. Although a series of successful portrait image toonification models built upon the powerful StyleGAN have been proposed, these image-oriented methods have obvious limitations when applied to videos, such as the fixed frame size, the requirement of face alignment, missing non-facial details and temporal inconsistency. In this work, we investigate the challenging controllable high-resolution portrait video style transfer by introducing a novel VToonify framework. Specifically, VToonify leverages the mid- and high-resolution layers of StyleGAN to render high-quality artistic portraits based on the multi-scale content features extracted by an encoder to better preserve the frame details. The resulting fully convolutional architecture accepts non-aligned faces in videos of variable size as input, contributing to complete face regions with natural motions in the output. Our framework is compatible with existing StyleGAN-based image toonification models to extend them to video toonification, and inherits appealing features of these models for flexible style control on color and intensity. This work presents two instantiations of VToonify built upon Toonify and DualStyleGAN for collection-based and exemplar-based portrait video style transfer, respectively. Extensive experimental results demonstrate the effectiveness of our proposed VToonify framework over existing methods in generating high-quality and temporally-coherent artistic portrait videos with flexible style controls.
PDF ACM Transactions on Graphics (SIGGRAPH Asia 2022). Code: https://github.com/williamyang1991/VToonify Project page: https://www.mmlab-ntu.com/project/vtoonify/

点此查看论文截图

Augmentation Backdoors

Authors:Joseph Rance, Yiren Zhao, Ilia Shumailov, Robert Mullins

Data augmentation is used extensively to improve model generalisation. However, reliance on external libraries to implement augmentation methods introduces a vulnerability into the machine learning pipeline. It is well known that backdoors can be inserted into machine learning models through serving a modified dataset to train on. Augmentation therefore presents a perfect opportunity to perform this modification without requiring an initially backdoored dataset. In this paper we present three backdoor attacks that can be covertly inserted into data augmentation. Our attacks each insert a backdoor using a different type of computer vision augmentation transform, covering simple image transforms, GAN-based augmentation, and composition-based augmentation. By inserting the backdoor using these augmentation transforms, we make our backdoors difficult to detect, while still supporting arbitrary backdoor functionality. We evaluate our attacks on a range of computer vision benchmarks and demonstrate that an attacker is able to introduce backdoors through just a malicious augmentation routine.
PDF 12 pages, 8 figures

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录