Vision Transformer

发布日期: 2023-01-05

2023-01-05 更新

Semi-MAE: Masked Autoencoders for Semi-supervised Vision Transformers

Authors:Haojie Yu, Kang Zhao, Xiaoming Xu

Vision Transformer (ViT) suffers from data scarcity in semi-supervised learning (SSL). To alleviate this issue, inspired by masked autoencoder (MAE), which is a data-efficient self-supervised learner, we propose Semi-MAE, a pure ViT-based SSL framework consisting of a parallel MAE branch to assist the visual representation learning and make the pseudo labels more accurate. The MAE branch is designed as an asymmetric architecture consisting of a lightweight decoder and a shared-weights encoder. We feed the weakly-augmented unlabeled data with a high masking ratio to the MAE branch and reconstruct the missing pixels. Semi-MAE achieves 75.9% top-1 accuracy on ImageNet with 10% labels, surpassing prior state-of-the-art in semi-supervised image classification. In addition, extensive experiments demonstrate that Semi-MAE can be readily used for other ViT models and masked image modeling methods.
PDF

点此查看论文截图

木子已

https://ipaper.today/2023/01/05/2023-01-05-vision-transformer/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源木子已 !

Vision Transformer

检测/分割/跟踪

2023-01-05 检测/分割/跟踪

检测分割跟踪

场景文本检测识别

2023-01-05 场景文本检测识别

场景文本检测识别

Vision Transformer

2023-01-05 更新

Semi-MAE: Masked Autoencoders for Semi-supervised Vision Transformers

打赏用于支持本站流量费