Vision Transformer

发布日期: 2023-01-06

2023-01-06 更新

Skip-Attention: Improving Vision Transformers by Paying Less Attention

Authors:Shashanka Venkataramanan, Amir Ghodrati, Yuki M. Asano, Fatih Porikli, Amirhossein Habibian

This work aims to improve the efficiency of vision transformers (ViT). While ViTs use computationally expensive self-attention operations in every layer, we identify that these operations are highly correlated across layers — a key redundancy that causes unnecessary computations. Based on this observation, we propose SkipAt, a method to reuse self-attention computation from preceding layers to approximate attention at one or more subsequent layers. To ensure that reusing self-attention blocks across layers does not degrade the performance, we introduce a simple parametric function, which outperforms the baseline transformer’s performance while running computationally faster. We show the effectiveness of our method in image classification and self-supervised learning on ImageNet-1K, semantic segmentation on ADE20K, image denoising on SIDD, and video denoising on DAVIS. We achieve improved throughput at the same-or-higher accuracy levels in all these tasks.
PDF

点此查看论文截图

木子已

https://ipaper.today/2023/01/06/2023-01-06-vision-transformer/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源木子已 !

Vision Transformer

无监督/半监督/对比学习

2023-01-06 无监督/半监督/对比学习

无监督半监督对比学习

视频理解

2023-01-06 视频理解

视频理解

Vision Transformer

2023-01-06 更新

Skip-Attention: Improving Vision Transformers by Paying Less Attention

打赏用于支持本站流量费