Few-Shot


2023-04-14 更新

Open-TransMind: A New Baseline and Benchmark for 1st Foundation Model Challenge of Intelligent Transportation

Authors:Yifeng Shi, Feng Lv, Xinliang Wang, Chunlong Xia, Shaojie Li, Shujie Yang, Teng Xi, Gang Zhang

With the continuous improvement of computing power and deep learning algorithms in recent years, the foundation model has grown in popularity. Because of its powerful capabilities and excellent performance, this technology is being adopted and applied by an increasing number of industries. In the intelligent transportation industry, artificial intelligence faces the following typical challenges: few shots, poor generalization, and a lack of multi-modal techniques. Foundation model technology can significantly alleviate the aforementioned issues. To address these, we designed the 1st Foundation Model Challenge, with the goal of increasing the popularity of foundation model technology in traffic scenarios and promoting the rapid development of the intelligent transportation industry. The challenge is divided into two tracks: all-in-one and cross-modal image retrieval. Furthermore, we provide a new baseline and benchmark for the two tracks, called Open-TransMind. According to our knowledge, Open-TransMind is the first open-source transportation foundation model with multi-task and multi-modal capabilities. Simultaneously, Open-TransMind can achieve state-of-the-art performance on detection, classification, and segmentation datasets of traffic scenarios. Our source code is available at https://github.com/Traffic-X/Open-TransMind.
PDF

点此查看论文截图

[CLS] Token is All You Need for Zero-Shot Semantic Segmentation

Authors:Letian Wu, Wenyao Zhang, Tengping Jiang, Wankou Yang, Xin Jin, Wenjun Zeng

In this paper, we propose an embarrassingly simple yet highly effective zero-shot semantic segmentation (ZS3) method, based on the pre-trained vision-language model CLIP. First, our study provides a couple of key discoveries: (i) the global tokens (a.k.a [CLS] tokens in Transformer) of the text branch in CLIP provide a powerful representation of semantic information and (ii) these text-side [CLS] tokens can be regarded as category priors to guide CLIP visual encoder pay more attention on the corresponding region of interest. Based on that, we build upon the CLIP model as a backbone which we extend with a One-Way [CLS] token navigation from text to the visual branch that enables zero-shot dense prediction, dubbed \textbf{ClsCLIP}. Specifically, we use the [CLS] token output from the text branch, as an auxiliary semantic prompt, to replace the [CLS] token in shallow layers of the ViT-based visual encoder. This one-way navigation embeds such global category prior earlier and thus promotes semantic segmentation. Furthermore, to better segment tiny objects in ZS3, we further enhance ClsCLIP with a local zoom-in strategy, which employs a region proposal pre-processing and we get ClsCLIP+. Extensive experiments demonstrate that our proposed ZS3 method achieves a SOTA performance, and it is even comparable with those few-shot semantic segmentation methods.
PDF 8 pages,6 figures

点此查看论文截图

Out-of-distribution Few-shot Learning For Edge Devices without Model Fine-tuning

Authors:Xinyun Zhang, Lanqing Hong

Few-shot learning (FSL) via customization of a deep learning network with limited data has emerged as a promising technique to achieve personalized user experiences on edge devices. However, existing FSL methods primarily assume independent and identically distributed (IID) data and utilize either computational backpropagation updates for each task or a common model with task-specific prototypes. Unfortunately, the former solution is infeasible for edge devices that lack on-device backpropagation capabilities, while the latter often struggles with limited generalization ability, especially for out-of-distribution (OOD) data. This paper proposes a lightweight, plug-and-play FSL module called Task-aware Normalization (TANO) that enables efficient and task-aware adaptation of a deep neural network without backpropagation. TANO covers the properties of multiple user groups by coordinating the updates of several groups of the normalization statistics during meta-training and automatically identifies the appropriate normalization group for a downstream few-shot task. Consequently, TANO provides stable but task-specific estimations of the normalization statistics to close the distribution gaps and achieve efficient model adaptation. Results on both intra-domain and out-of-domain generalization experiments demonstrate that TANO outperforms recent methods in terms of accuracy, inference speed, and model size. Moreover, TANO achieves promising results on widely-used FSL benchmarks and data from real applications.
PDF

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录