Vision Transformer


2022-04-27 更新

Deeper Insights into ViTs Robustness towards Common Corruptions

Authors:Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yugang Jiang

Recent literature have shown design strategies from Convolutions Neural Networks (CNNs) benefit Vision Transformers (ViTs) in various vision tasks. However, it remains unclear how these design choices impact on robustness when transferred to ViTs. In this paper, we make the first attempt to investigate how CNN-like architectural designs and CNN-based data augmentation strategies impact on ViTs’ robustness towards common corruptions through an extensive and rigorous benchmarking. We demonstrate that overlapping patch embedding and convolutional Feed-Forward Network (FFN) boost performance on robustness. Furthermore, adversarial noise training is powerful on ViTs while fourier-domain augmentation fails. Moreover, we introduce a novel conditional method enabling input-varied augmentations from two angles: (1) Generating dynamic augmentation parameters conditioned on input images. It conduces to state-of-the-art performance on robustness through conditional convolutions; (2) Selecting most suitable augmentation strategy by an extra predictor helps to achieve the best trade-off between clean accuracy and robustness.
PDF

论文截图

Boosting Adversarial Transferability of MLP-Mixer

Authors:Haoran Lyu, Yajie Wang, Yu-an Tan, Huipeng Zhou, Yuhang Zhao, Quanxin Zhang

The security of models based on new architectures such as MLP-Mixer and ViTs needs to be studied urgently. However, most of the current researches are mainly aimed at the adversarial attack against ViTs, and there is still relatively little adversarial work on MLP-mixer. We propose an adversarial attack method against MLP-Mixer called Maxwell’s demon Attack (MA). MA breaks the channel-mixing and token-mixing mechanism of MLP-Mixer by controlling the part input of MLP-Mixer’s each Mixer layer, and disturbs MLP-Mixer to obtain the main information of images. Our method can mask the part input of the Mixer layer, avoid overfitting of the adversarial examples to the source model, and improve the transferability of cross-architecture. Extensive experimental evaluation demonstrates the effectiveness and superior performance of the proposed MA. Our method can be easily combined with existing methods and can improve the transferability by up to 38.0% on MLP-based ResMLP. Adversarial examples produced by our method on MLP-Mixer are able to exceed the transferability of adversarial examples produced using DenseNet against CNNs. To the best of our knowledge, we are the first work to study adversarial transferability of MLP-Mixer.
PDF

论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录