Vision Transformer

发布日期: 2022-05-29

2022-05-29 更新

Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels

Authors:Tianxin Tao, Daniele Reda, Michiel van de Panne

Vision Transformers (ViT) have recently demonstrated the significant potential of transformer architectures for computer vision. To what extent can image-based deep reinforcement learning also benefit from ViT architectures, as compared to standard convolutional neural network (CNN) architectures? To answer this question, we evaluate ViT training methods for image-based reinforcement learning (RL) control tasks and compare these results to a leading convolutional-network architecture method, RAD. For training the ViT encoder, we consider several recently-proposed self-supervised losses that are treated as auxiliary tasks, as well as a baseline with no additional loss terms. We find that the CNN architectures trained using RAD still generally provide superior performance. For the ViT methods, all three types of auxiliary tasks that we consider provide a benefit over plain ViT training. Furthermore, ViT reconstruction-based tasks are found to significantly outperform ViT contrastive-learning.
PDF

论文截图

木子已

https://ipaper.today/2022/05/29/2022-05-29-vision-transformer/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源木子已 !

Vision Transformer

检测/分割/跟踪

2022-05-29 检测/分割/跟踪

检测分割跟踪

GAN

2022-05-29 GAN

GAN

Vision Transformer

2022-05-29 更新

Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels

打赏用于支持本站流量费