Vision Transformer

发布日期: 2023-04-03

2023-04-03 更新

Rethinking Local Perception in Lightweight Vision Transformer

Authors:Qihang Fan, Huaibo Huang, Jiyang Guan, Ran He

Vision Transformers (ViTs) have been shown to be effective in various vision tasks. However, resizing them to a mobile-friendly size leads to significant performance degradation. Therefore, developing lightweight vision transformers has become a crucial area of research. This paper introduces CloFormer, a lightweight vision transformer that leverages context-aware local enhancement. CloFormer explores the relationship between globally shared weights often used in vanilla convolutional operators and token-specific context-aware weights appearing in attention, then proposes an effective and straightforward module to capture high-frequency local information. In CloFormer, we introduce AttnConv, a convolution operator in attention’s style. The proposed AttnConv uses shared weights to aggregate local information and deploys carefully designed context-aware weights to enhance local features. The combination of the AttnConv and vanilla attention which uses pooling to reduce FLOPs in CloFormer enables the model to perceive high-frequency and low-frequency information. Extensive experiments were conducted in image classification, object detection, and semantic segmentation, demonstrating the superiority of CloFormer.
PDF

点此查看论文截图

木子已

https://ipaper.today/2023/04/03/2023-04-03-vision-transformer/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源木子已 !

Vision Transformer

检测/分割/跟踪

2023-04-03 检测/分割/跟踪

检测分割跟踪

视频理解

2023-04-03 视频理解

视频理解

Vision Transformer

2023-04-03 更新

Rethinking Local Perception in Lightweight Vision Transformer

打赏用于支持本站流量费