Vision Transformer


2023-04-07 更新

Towards an Effective and Efficient Transformer for Rain-by-snow Weather Removal

Authors:Tao Gao, Yuanbo Wen, Kaihao Zhang, Peng Cheng, Ting Chen

Rain-by-snow weather removal is a specialized task in weather-degraded image restoration aiming to eliminate coexisting rain streaks and snow particles. In this paper, we propose RSFormer, an efficient and effective Transformer that addresses this challenge. Initially, we explore the proximity of convolution networks (ConvNets) and vision Transformers (ViTs) in hierarchical architectures and experimentally find they perform approximately at intra-stage feature learning. On this basis, we utilize a Transformer-like convolution block (TCB) that replaces the computationally expensive self-attention while preserving attention characteristics for adapting to input content. We also demonstrate that cross-stage progression is critical for performance improvement, and propose a global-local self-attention sampling mechanism (GLASM) that down-/up-samples features while capturing both global and local dependencies. Finally, we synthesize two novel rain-by-snow datasets, RSCityScape and RS100K, to evaluate our proposed RSFormer. Extensive experiments verify that RSFormer achieves the best trade-off between performance and time-consumption compared to other restoration methods. For instance, it outperforms Restormer with a 1.53% reduction in the number of parameters and a 15.6% reduction in inference time. Datasets, source code and pre-trained models are available at \url{https://github.com/chdwyb/RSFormer}.
PDF code is available at \url{https://github.com/chdwyb/RSFormer}

点此查看论文截图

InterFormer: Real-time Interactive Image Segmentation

Authors:You Huang, Hao Yang, Ke Sun, Shengchuan Zhang, Guannan Jiang, Rongrong Ji, Liujuan Cao

Interactive image segmentation enables annotators to efficiently perform pixel-level annotation for segmentation tasks. However, the existing interactive segmentation pipeline suffers from inefficient computations of interactive models because of the following two issues. First, annotators’ later click is based on models’ feedback of annotators’ former click. This serial interaction is unable to utilize model’s parallelism capabilities. Second, the model has to repeatedly process the image, the annotator’s current click, and the model’s feedback of the annotator’s former clicks at each step of interaction, resulting in redundant computations. For efficient computation, we propose a method named InterFormer that follows a new pipeline to address these issues. InterFormer extracts and preprocesses the computationally time-consuming part i.e. image processing from the existing process. Specifically, InterFormer employs a large vision transformer (ViT) on high-performance devices to preprocess images in parallel, and then uses a lightweight module called interactive multi-head self attention (I-MSA) for interactive segmentation. Furthermore, the I-MSA module’s deployment on low-power devices extends the practical application of interactive segmentation. The I-MSA module utilizes the preprocessed features to efficiently response to the annotator inputs in real-time. The experiments on several datasets demonstrate the effectiveness of InterFormer, which outperforms previous interactive segmentation models in terms of computational efficiency and segmentation quality, achieve real-time high-quality interactive segmentation on CPU-only devices.
PDF

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录