发布日期: 2023-03-17

2023-03-17 更新

DeepMIM: Deep Supervision for Masked Image Modeling

Authors:Sucheng Ren, Fangyun Wei, Samuel Albanie, Zheng Zhang, Han Hu

Deep supervision, which involves extra supervisions to the intermediate features of a neural network, was widely used in image classification in the early deep learning era since it significantly reduces the training difficulty and eases the optimization like avoiding gradient vanish over the vanilla training. Nevertheless, with the emergence of normalization techniques and residual connection, deep supervision in image classification was gradually phased out. In this paper, we revisit deep supervision for masked image modeling (MIM) that pre-trains a Vision Transformer (ViT) via a mask-and-predict scheme. Experimentally, we find that deep supervision drives the shallower layers to learn more meaningful representations, accelerates model convergence, and expands attention diversities. Our approach, called DeepMIM, significantly boosts the representation capability of each layer. In addition, DeepMIM is compatible with many MIM models across a range of reconstruction targets. For instance, using ViT-B, DeepMIM on MAE achieves 84.2 top-1 accuracy on ImageNet, outperforming MAE by +0.6. By combining DeepMIM with a stronger tokenizer CLIP, our model achieves state-of-the-art performance on various downstream tasks, including image classification (85.6 top-1 accuracy on ImageNet-1K, outperforming MAE-CLIP by +0.8), object detection (52.8 APbox on COCO) and semantic segmentation (53.1 mIoU on ADE20K). Code and models are available at https://github.com/OliverRensu/DeepMIM.
PDF Code and models are available at https://github.com/OliverRensu/DeepMIM

点此查看论文截图

木子已

https://ipaper.today/2023/03/17/2023-03-17-vision-transformer/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源木子已 !

Vision Transformer

检测/分割/跟踪

2023-03-17 检测/分割/跟踪

检测分割跟踪

I2I Translation

2023-03-17 I2I Translation

I2I Translation

Vision Transformer

2023-03-17 更新

DeepMIM: Deep Supervision for Masked Image Modeling

打赏用于支持本站流量费