2022-07-10 更新
Generating Sparse Counterfactual Explanations For Multivariate Time Series
Authors:Jana Lang, Martin Giese, Winfried Ilg, Sebastian Otte
Since neural networks play an increasingly important role in critical sectors, explaining network predictions has become a key research topic. Counterfactual explanations can help to understand why classifier models decide for particular class assignments and, moreover, how the respective input samples would have to be modified such that the class prediction changes. Previous approaches mainly focus on image and tabular data. In this work we propose SPARCE, a generative adversarial network (GAN) architecture that generates SPARse Counterfactual Explanations for multivariate time series. Our approach provides a custom sparsity layer and regularizes the counterfactual loss function in terms of similarity, sparsity, and smoothness of trajectories. We evaluate our approach on real-world human motion datasets as well as a synthetic time series interpretability benchmark. Although we make significantly sparser modifications than other approaches, we achieve comparable or better performance on all metrics. Moreover, we demonstrate that our approach predominantly modifies salient time steps and features, leaving non-salient inputs untouched.
PDF 13 pages, 7 figures. Preprint. Under review; added appendix
点此查看论文截图
Text to Image Synthesis using Stacked Conditional Variational Autoencoders and Conditional Generative Adversarial Networks
Authors:Haileleol Tibebu, Aadin Malik, Varuna De Silva
Synthesizing a realistic image from textual description is a major challenge in computer vision. Current text to image synthesis approaches falls short of producing a highresolution image that represent a text descriptor. Most existing studies rely either on Generative Adversarial Networks (GANs) or Variational Auto Encoders (VAEs). GANs has the capability to produce sharper images but lacks the diversity of outputs, whereas VAEs are good at producing a diverse range of outputs, but the images generated are often blurred. Taking into account the relative advantages of both GANs and VAEs, we proposed a new stacked Conditional VAE (CVAE) and Conditional GAN (CGAN) network architecture for synthesizing images conditioned on a text description. This study uses Conditional VAEs as an initial generator to produce a high-level sketch of the text descriptor. This high-level sketch output from first stage and a text descriptor is used as an input to the conditional GAN network. The second stage GAN produces a 256x256 high resolution image. The proposed architecture benefits from a conditioning augmentation and a residual block on the Conditional GAN network to achieve the results. Multiple experiments were conducted using CUB and Oxford-102 dataset and the result of the proposed approach is compared against state-ofthe-art techniques such as StackGAN. The experiments illustrate that the proposed method generates a high-resolution image conditioned on text descriptions and yield competitive results based on Inception and Frechet Inception Score using both datasets
PDF
点此查看论文截图
Backdoor Attack is A Devil in Federated GAN-based Medical Image Synthesis
Authors:Ruinan Jin, Xiaoxiao Li
Deep Learning-based image synthesis techniques have been applied in healthcare research for generating medical images to support open research. Training generative adversarial neural networks (GAN) usually requires large amounts of training data. Federated learning (FL) provides a way of training a central model using distributed data from different medical institutions while keeping raw data locally. However, FL is vulnerable to backdoor attack, an adversarial by poisoning training data, given the central server cannot access the original data directly. Most backdoor attack strategies focus on classification models and centralized domains. In this study, we propose a way of attacking federated GAN (FedGAN) by treating the discriminator with a commonly used data poisoning strategy in backdoor attack classification models. We demonstrate that adding a small trigger with size less than 0.5 percent of the original image size can corrupt the FL-GAN model. Based on the proposed attack, we provide two effective defense strategies: global malicious detection and local training regularization. We show that combining the two defense strategies yields a robust medical image generation.
PDF 13 pages, 4 figures
点此查看论文截图
Memory Efficient Patch-based Training for INR-based GANs
Authors:Namwoo Lee, Hyunsu Kim, Gayoung Lee, Sungjoo Yoo, Yunjey Choi
Recent studies have shown remarkable progress in GANs based on implicit neural representation (INR) - an MLP that produces an RGB value given its (x, y) coordinate. They represent an image as a continuous version of the underlying 2D signal instead of a 2D array of pixels, which opens new horizons for GAN applications (e.g., zero-shot super-resolution, image outpainting). However, training existing approaches require a heavy computational cost proportional to the image resolution, since they compute an MLP operation for every (x, y) coordinate. To alleviate this issue, we propose a multi-stage patch-based training, a novel and scalable approach that can train INR-based GANs with a flexible computational cost regardless of the image resolution. Specifically, our method allows to generate and discriminate by patch to learn the local details of the image and learn global structural information by a novel reconstruction loss to enable efficient GAN training. We conduct experiments on several benchmark datasets to demonstrate that our approach enhances baseline models in GPU memory while maintaining FIDs at a reasonable level.
PDF 5 pages, 4 figures, arXiv preprint
点此查看论文截图
Mining Multi-Label Samples from Single Positive Labels
Authors:Youngin Cho, Daejin Kim, Mohammad Azam Khan, Jaegul Choo
Conditional generative adversarial networks (cGANs) have shown superior results in class-conditional generation tasks. In order to simultaneously control multiple conditions, cGANs require multi-label training datasets, where multiple labels can be assigned to each data instance. Nevertheless, the tremendous annotation cost limits the accessibility of multi-label datasets in the real-world scenarios. Hence, we explore the practical setting called single positive setting, where each data instance is annotated by only one positive label with no explicit negative labels. To generate multi-label data in the single positive setting, we propose a novel sampling approach called single-to-multi-label (S2M) sampling, based on the Markov chain Monte Carlo method. As a widely applicable “add-on” method, our proposed S2M sampling enables existing unconditional and conditional GANs to draw high-quality multi-label data with a minimal annotation cost. Extensive experiments on real image datasets verify the effectiveness and correctness of our method, even when compared to a model trained with fully annotated datasets.
PDF
点此查看论文截图
Towards Counterfactual Image Manipulation via CLIP
Authors:Yingchen Yu, Fangneng Zhan, Rongliang Wu, Jiahui Zhang, Shijian Lu, Miaomiao Cui, Xuansong Xie, Xian-Sheng Hua, Chunyan Miao
Leveraging StyleGAN’s expressivity and its disentangled latent codes, existing methods can achieve realistic editing of different visual attributes such as age and gender of facial images. An intriguing yet challenging problem arises: Can generative models achieve counterfactual editing against their learnt priors? Due to the lack of counterfactual samples in natural datasets, we investigate this problem in a text-driven manner with Contrastive-Language-Image-Pretraining (CLIP), which can offer rich semantic knowledge even for various counterfactual concepts. Different from in-domain manipulation, counterfactual manipulation requires more comprehensive exploitation of semantic knowledge encapsulated in CLIP as well as more delicate handling of editing directions for avoiding being stuck in local minimum or undesired editing. To this end, we design a novel contrastive loss that exploits predefined CLIP-space directions to guide the editing toward desired directions from different perspectives. In addition, we design a simple yet effective scheme that explicitly maps CLIP embeddings (of target text) to the latent space and fuses them with latent codes for effective latent code optimization and accurate editing. Extensive experiments show that our design achieves accurate and realistic editing while driving by target texts with various counterfactual concepts.
PDF This paper has been accepted to ACM MM 2022, code may be found here: https://github.com/yingchen001/CF-CLIP
点此查看论文截图
Disentangling Random and Cyclic Effects in Time-Lapse Sequences
Authors:Erik Härkönen, Miika Aittala, Tuomas Kynkäänniemi, Samuli Laine, Timo Aila, Jaakko Lehtinen
Time-lapse image sequences offer visually compelling insights into dynamic processes that are too slow to observe in real time. However, playing a long time-lapse sequence back as a video often results in distracting flicker due to random effects, such as weather, as well as cyclic effects, such as the day-night cycle. We introduce the problem of disentangling time-lapse sequences in a way that allows separate, after-the-fact control of overall trends, cyclic effects, and random effects in the images, and describe a technique based on data-driven generative models that achieves this goal. This enables us to “re-render” the sequences in ways that would not be possible with the input images alone. For example, we can stabilize a long sequence to focus on plant growth over many months, under selectable, consistent weather. Our approach is based on Generative Adversarial Networks (GAN) that are conditioned with the time coordinate of the time-lapse sequence. Our architecture and training procedure are designed so that the networks learn to model random variations, such as weather, using the GAN’s latent space, and to disentangle overall trends and cyclic variations by feeding the conditioning time label to the model using Fourier features with specific frequencies. We show that our models are robust to defects in the training data, enabling us to amend some of the practical difficulties in capturing long time-lapse sequences, such as temporary occlusions, uneven frame spacing, and missing frames.
PDF Accepted to SIGGRAPH 2022. Code: https://github.com/harskish/tlgan
点此查看论文截图
Spatial-Temporal Frequency Forgery Clue for Video Forgery Detection in VIS and NIR Scenario
Authors:Yukai Wang, Chunlei Peng, Decheng Liu, Nannan Wang, Xinbo Gao
In recent years, with the rapid development of face editing and generation, more and more fake videos are circulating on social media, which has caused extreme public concerns. Existing face forgery detection methods based on frequency domain find that the GAN forged images have obvious grid-like visual artifacts in the frequency spectrum compared to the real images. But for synthesized videos, these methods only confine to single frame and pay little attention to the most discriminative part and temporal frequency clue among different frames. To take full advantage of the rich information in video sequences, this paper performs video forgery detection on both spatial and temporal frequency domains and proposes a Discrete Cosine Transform-based Forgery Clue Augmentation Network (FCAN-DCT) to achieve a more comprehensive spatial-temporal feature representation. FCAN-DCT consists of a backbone network and two branches: Compact Feature Extraction (CFE) module and Frequency Temporal Attention (FTA) module. We conduct thorough experimental assessments on two visible light (VIS) based datasets WildDeepfake and Celeb-DF (v2), and our self-built video forgery dataset DeepfakeNIR, which is the first video forgery dataset on near-infrared modality. The experimental results demonstrate the effectiveness of our method on detecting forgery videos in both VIS and NIR scenarios.
PDF