Vision Transformer

发布日期: 2023-01-13

2023-01-13 更新

ViTs for SITS: Vision Transformers for Satellite Image Time Series

Authors:Michail Tarasiou, Erik Chavez, Stefanos Zafeiriou

In this paper we introduce the Temporo-Spatial Vision Transformer (TSViT), a fully-attentional model for general Satellite Image Time Series (SITS) processing based on the Vision Transformer (ViT). TSViT splits a SITS record into non-overlapping patches in space and time which are tokenized and subsequently processed by a factorized temporo-spatial encoder. We argue, that in contrast to natural images, a temporal-then-spatial factorization is more intuitive for SITS processing and present experimental evidence for this claim. Additionally, we enhance the model’s discriminative power by introducing two novel mechanisms for acquisition-time-specific temporal positional encodings and multiple learnable class tokens. The effect of all novel design choices is evaluated through an extensive ablation study. Our proposed architecture achieves state-of-the-art performance, surpassing previous approaches by a significant margin in three publicly available SITS semantic segmentation and classification datasets. All model, training and evaluation codes are made publicly available to facilitate further research.
PDF 11 pages, 5 figures, 2 tables

点此查看论文截图

木子已

https://ipaper.today/2023/01/13/2023-01-13-vision-transformer/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源木子已 !

Vision Transformer

检测/分割/跟踪

2023-01-13 检测/分割/跟踪

检测分割跟踪

I2I Translation

2023-01-13 I2I Translation

I2I Translation

Vision Transformer

2023-01-13 更新

ViTs for SITS: Vision Transformers for Satellite Image Time Series

打赏用于支持本站流量费