Diffusion Models


2023-05-16 更新

On enhancing the robustness of Vision Transformers: Defensive Diffusion

Authors:Raza Imam, Muhammad Huzaifa, Mohammed El-Amine Azz

Privacy and confidentiality of medical data are of utmost importance in healthcare settings. ViTs, the SOTA vision model, rely on large amounts of patient data for training, which raises concerns about data security and the potential for unauthorized access. Adversaries may exploit vulnerabilities in ViTs to extract sensitive patient information and compromising patient privacy. This work address these vulnerabilities to ensure the trustworthiness and reliability of ViTs in medical applications. In this work, we introduced a defensive diffusion technique as an adversarial purifier to eliminate adversarial noise introduced by attackers in the original image. By utilizing the denoising capabilities of the diffusion model, we employ a reverse diffusion process to effectively eliminate the adversarial noise from the attack sample, resulting in a cleaner image that is then fed into the ViT blocks. Our findings demonstrate the effectiveness of the diffusion model in eliminating attack-agnostic adversarial noise from images. Additionally, we propose combining knowledge distillation with our framework to obtain a lightweight student model that is both computationally efficient and robust against gray box attacks. Comparison of our method with a SOTA baseline method, SEViT, shows that our work is able to outperform the baseline. Extensive experiments conducted on a publicly available Tuberculosis X-ray dataset validate the computational efficiency and improved robustness achieved by our proposed architecture.
PDF Our code is publicly available at https://github.com/Muhammad-Huzaifaa/Defensive_Diffusion

点此查看论文截图

Meta-DM: Applications of Diffusion Models on Few-Shot Learning

Authors:Wentao Hu, Xiurong Jiang, Jiarun Liu, Yuqi Yang, Hui Tian

In the field of few-shot learning (FSL), extensive research has focused on improving network structures and training strategies. However, the role of data processing modules has not been fully explored. Therefore, in this paper, we propose Meta-DM, a generalized data processing module for FSL problems based on diffusion models. Meta-DM is a simple yet effective module that can be easily integrated with existing FSL methods, leading to significant performance improvements in both supervised and unsupervised settings. We provide a theoretical analysis of Meta-DM and evaluate its performance on several algorithms. Our experiments show that combining Meta-DM with certain methods achieves state-of-the-art results.
PDF

点此查看论文截图

Diffusion Models for Imperceptible and Transferable Adversarial Attack

Authors:Jianqi Chen, Hao Chen, Keyan Chen, Yilan Zhang, Zhengxia Zou, Zhenwei Shi

Many existing adversarial attacks generate $L_p$-norm perturbations on image RGB space. Despite some achievements in transferability and attack success rate, the crafted adversarial examples are easily perceived by human eyes. Towards visual imperceptibility, some recent works explore unrestricted attacks without $L_p$-norm constraints, yet lacking transferability of attacking black-box models. In this work, we propose a novel imperceptible and transferable attack by leveraging both the generative and discriminative power of diffusion models. Specifically, instead of direct manipulation in pixel space, we craft perturbations in latent space of diffusion models. Combined with well-designed content-preserving structures, we can generate human-insensitive perturbations embedded with semantic clues. For better transferability, we further “deceive” the diffusion model which can be viewed as an additional recognition surrogate, by distracting its attention away from the target regions. To our knowledge, our proposed method, DiffAttack, is the first that introduces diffusion models into adversarial attack field. Extensive experiments on various model structures (including CNNs, Transformers, MLPs) and defense methods have demonstrated our superiority over other attack methods.
PDF Code Page: https://github.com/WindVChen/DiffAttack

点此查看论文截图

TESS: Text-to-Text Self-Conditioned Simplex Diffusion

Authors:Rabeeh Karimi Mahabadi, Jaesung Tae, Hamish Ivison, James Henderson, Iz Beltagy, Matthew E. Peters, Arman Cohan

Diffusion models have emerged as a powerful paradigm for generation, obtaining strong performance in various domains with continuous-valued inputs. Despite the promises of fully non-autoregressive text generation, applying diffusion models to natural language remains challenging due to its discrete nature. In this work, we propose Text-to-text Self-conditioned Simplex Diffusion (TESS), a text diffusion model that is fully non-autoregressive, employs a new form of self-conditioning, and applies the diffusion process on the logit simplex space rather than the typical learned embedding space. Through extensive experiments on natural language understanding and generation tasks including summarization, text simplification, paraphrase generation, and question generation, we demonstrate that TESS outperforms state-of-the-art non-autoregressive models and is competitive with pretrained autoregressive sequence-to-sequence models.
PDF 9 pages, 4 figures, preprint

点此查看论文截图

A Reproducible Extraction of Training Images from Diffusion Models

Authors:Ryan Webster

Recently, Carlini et al. demonstrated the widely used model Stable Diffusion can regurgitate real training samples, which is troublesome from a copyright perspective. In this work, we provide an efficient extraction attack on par with the recent attack, with several order of magnitudes less network evaluations. In the process, we expose a new phenomena, which we dub template verbatims, wherein a diffusion model will regurgitate a training sample largely in tact. Template verbatims are harder to detect as they require retrieval and masking to correctly label. Furthermore, they are still generated by newer systems, even those which de-duplicate their training set, and we give insight into why they still appear during generation. We extract training images from several state of the art systems, including Stable Diffusion 2.0, Deep Image Floyd, and finally Midjourney v4. We release code to verify our extraction attack, perform the attack, as well as all extracted prompts at \url{https://github.com/ryanwebster90/onestep-extraction}.
PDF

点此查看论文截图

Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models

Authors:Antoni Bigata Casademunt, Rodrigo Mira, Nikita Drobyshev, Konstantinos Vougioukas, Stavros Petridis, Maja Pantic

Speech-driven animation has gained significant traction in recent years, with current methods achieving near-photorealistic results. However, the field remains underexplored regarding non-verbal communication despite evidence demonstrating its importance in human interaction. In particular, generating laughter sequences presents a unique challenge due to the intricacy and nuances of this behaviour. This paper aims to bridge this gap by proposing a novel model capable of generating realistic laughter sequences, given a still portrait and an audio clip containing laughter. We highlight the failure cases of traditional facial animation methods and leverage recent advances in diffusion models to produce convincing laughter videos. We train our model on a diverse set of laughter datasets and introduce an evaluation metric specifically designed for laughter. When compared with previous speech-driven approaches, our model achieves state-of-the-art performance across all metrics, even when these are re-trained for laughter generation.
PDF

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录