Vision Transformer

发布日期: 2023-01-03

2023-01-03 更新

Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling

Authors:Xin Ma, Chang Liu, Chunyu Xie, Long Ye, Yafeng Deng, Xiangyang Ji

Masked image modeling (MIM) has shown great promise for self-supervised learning (SSL) yet been criticized for learning inefficiency. We believe the insufficient utilization of training signals should be responsible. To alleviate this issue, we introduce a conceptually simple yet learning-efficient MIM training scheme, termed Disjoint Masking with Joint Distillation (DMJD). For disjoint masking (DM), we sequentially sample multiple masked views per image in a mini-batch with the disjoint regulation to raise the usage of tokens for reconstruction in each image while keeping the masking rate of each view. For joint distillation (JD), we adopt a dual branch architecture to respectively predict invisible (masked) and visible (unmasked) tokens with superior learning targets. Rooting in orthogonal perspectives for training efficiency improvement, DM and JD cooperatively accelerate the training convergence yet not sacrificing the model generalization ability. Concretely, DM can train ViT with half of the effective training epochs (3.7 times less time-consuming) to report competitive performance. With JD, our DMJD clearly improves the linear probing classification accuracy over ConvMAE by 5.8%. On fine-grained downstream tasks like semantic segmentation, object detection, etc., our DMJD also presents superior generalization compared with state-of-the-art SSL methods. The code and model will be made public at https://github.com/mx-mark/DMJD.
PDF

点此查看论文截图

木子已

https://ipaper.today/2023/01/03/2023-01-03-vision-transformer/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源木子已 !

Vision Transformer

检测/分割/跟踪

2023-01-03 检测/分割/跟踪

检测分割跟踪

I2I Translation

2023-01-03 I2I Translation

I2I Translation

Vision Transformer

2023-01-03 更新

Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling

打赏用于支持本站流量费