Few-Shot


2022-10-05 更新

Metric Distribution to Vector: Constructing Data Representation via Broad-Scale Discrepancies

Authors:Xue Liu, Dan Sun, Xiaobo Cao, Hao Ye, Wei Wei

Graph embedding provides a feasible methodology to conduct pattern classification for graph-structured data by mapping each data into the vectorial space. Various pioneering works are essentially coding method that concentrates on a vectorial representation about the inner properties of a graph in terms of the topological constitution, node attributions, link relations, etc. However, the classification for each targeted data is a qualitative issue based on understanding the overall discrepancies within the dataset scale. From the statistical point of view, these discrepancies manifest a metric distribution over the dataset scale if the distance metric is adopted to measure the pairwise similarity or dissimilarity. Therefore, we present a novel embedding strategy named $\mathbf{MetricDistribution2vec}$ to extract such distribution characteristics into the vectorial representation for each data. We demonstrate the application and effectiveness of our representation method in the supervised prediction tasks on extensive real-world structural graph datasets. The results have gained some unexpected increases compared with a surge of baselines on all the datasets, even if we take the lightweight models as classifiers. Moreover, the proposed methods also conducted experiments in Few-Shot classification scenarios, and the results still show attractive discrimination in rare training samples based inference.
PDF

点此查看论文截图

SHRED: 3D Shape Region Decomposition with Learned Local Operations

Authors:R. Kenny Jones, Aalia Habib, Daniel Ritchie

We present SHRED, a method for 3D SHape REgion Decomposition. SHRED takes a 3D point cloud as input and uses learned local operations to produce a segmentation that approximates fine-grained part instances. We endow SHRED with three decomposition operations: splitting regions, fixing the boundaries between regions, and merging regions together. Modules are trained independently and locally, allowing SHRED to generate high-quality segmentations for categories not seen during training. We train and evaluate SHRED with fine-grained segmentations from PartNet; using its merge-threshold hyperparameter, we show that SHRED produces segmentations that better respect ground-truth annotations compared with baseline methods, at any desired decomposition granularity. Finally, we demonstrate that SHRED is useful for downstream applications, out-performing all baselines on zero-shot fine-grained part instance segmentation and few-shot fine-grained semantic segmentation when combined with methods that learn to label shape regions.
PDF SIGGRAPH ASIA 2022

点此查看论文截图

PCC: Paraphrasing with Bottom-k Sampling and Cyclic Learning for Curriculum Data Augmentation

Authors:Hongyuan Lu, Wai Lam

Curriculum Data Augmentation (CDA) improves neural models by presenting synthetic data with increasing difficulties from easy to hard. However, traditional CDA simply treats the ratio of word perturbation as the difficulty measure and goes through the curriculums only once. This paper presents \textbf{PCC}: \textbf{P}araphrasing with Bottom-k Sampling and \textbf{C}yclic Learning for \textbf{C}urriculum Data Augmentation, a novel CDA framework via paraphrasing, which exploits the textual paraphrase similarity as the curriculum difficulty measure. We propose a curriculum-aware paraphrase generation module composed of three units: a paraphrase candidate generator with bottom-k sampling, a filtering mechanism and a difficulty measure. We also propose a cyclic learning strategy that passes through the curriculums multiple times. The bottom-k sampling is proposed to generate super-hard instances for the later curriculums. Experimental results on few-shot text classification as well as dialogue generation indicate that PCC surpasses competitive baselines. Human evaluation and extensive case studies indicate that bottom-k sampling effectively generates super-hard instances, and PCC significantly improves the baseline dialogue agent.
PDF

点此查看论文截图

Offline Handwritten Amharic Character Recognition Using Few-shot Learning

Authors:Mesay Samuel, Lars Schmidt-Thieme, DP Sharma, Abiot Sinamo, Abey Bruck

Few-shot learning is an important, but challenging problem of machine learning aimed at learning from only fewer labeled training examples. It has become an active area of research due to deep learning requiring huge amounts of labeled dataset, which is not feasible in the real world. Learning from a few examples is also an important attempt towards learning like humans. Few-shot learning has proven a very good promise in different areas of machine learning applications, particularly in image classification. As it is a recent technique, most researchers focus on understanding and solving the issues related to its concept by focusing only on common image datasets like Mini-ImageNet and Omniglot. Few-shot learning also opens an opportunity to address low resource languages like Amharic. In this study, offline handwritten Amharic character recognition using few-shot learning is addressed. Particularly, prototypical networks, the popular and simpler type of few-shot learning, is implemented as a baseline. Using the opportunities explored in the nature of Amharic alphabet having row-wise and column-wise similarities, a novel way of augmenting the training episodes is proposed. The experimental results show that the proposed method outperformed the baseline method. This study has implemented few-shot learning for Amharic characters for the first time. More importantly, the findings of the study open new ways of examining the influence of training episodes in few-shot learning, which is one of the important issues that needs exploration. The datasets used for this study are collected from native Amharic language writers using an Android App developed as a part of this study.
PDF PanAfriCon AI 2022 virtual conference paper

点此查看论文截图

Improving ProtoNet for Few-Shot Video Object Recognition: Winner of ORBIT Challenge 2022

Authors:Li Gu, Zhixiang Chi, Huan Liu, Yuanhao Yu, Yang Wang

In this work, we present the winning solution for ORBIT Few-Shot Video Object Recognition Challenge 2022. Built upon the ProtoNet baseline, the performance of our method is improved with three effective techniques. These techniques include the embedding adaptation, the uniform video clip sampler and the invalid frame detection. In addition, we re-factor and re-implement the official codebase to encourage modularity, compatibility and improved performance. Our implementation accelerates the data loading in both training and testing.
PDF Winner of ORBIT Challenge 2022

点此查看论文截图

2022-10-05 更新

Prompt Learning with Optimal Transport for Vision-Language Models

Authors:Guangyi Chen, Weiran Yao, Xiangchen Song, Xinyue Li, Yongming Rao, Kun Zhang

With the increasing attention to large vision-language models such as CLIP, there has been a significant amount of effort dedicated to building efficient prompts. Unlike conventional methods of only learning one single prompt, we propose to learn multiple comprehensive prompts to describe diverse characteristics of categories such as intrinsic attributes or extrinsic contexts. However, directly matching each prompt to the same visual feature is problematic, as it pushes the prompts to converge to one point. To solve this problem, we propose to apply optimal transport to match the vision and text modalities. Specifically, we first model images and the categories with visual and textual feature sets. Then, we apply a two-stage optimization strategy to learn the prompts. In the inner loop, we optimize the optimal transport distance to align visual features and prompts by the Sinkhorn algorithm, while in the outer loop, we learn the prompts by this distance from the supervised data. Extensive experiments are conducted on the few-shot recognition task and the improvement demonstrates the superiority of our method.
PDF

点此查看论文截图

PSP: Pre-trained Soft Prompts for Few-Shot Abstractive Summarization

Authors:Xiaochen Liu, Yang Gao, Yu Bai, Jiawei Li, Yinan Hu, Heyan Huang, Boxing Chen

Few-shot abstractive summarization has become a challenging task in natural language generation. To support it, we designed a novel soft prompts architecture coupled with a prompt pre-training plus fine-tuning paradigm that is effective and tunes only extremely light parameters. The soft prompts include continuous input embeddings across an encoder and a decoder to fit the structure of the generation models. Importantly, a novel inner-prompt placed in the text is introduced to capture document-level information. The aim is to devote attention to understanding the document that better prompts the model to generate document-related content. The first step in the summarization procedure is to conduct prompt pre-training with self-supervised pseudo-data. This teaches the model basic summarizing capabilities. The model is then fine-tuned with few-shot examples. Experimental results on the CNN/DailyMail and XSum datasets show that our method, with only 0.1% of the parameters, outperforms full-model tuning where all model parameters are tuned. It also surpasses Prompt Tuning by a large margin and delivers competitive results against Prefix-Tuning with 3% of the parameters.
PDF Accepted as a long paper in COLING 2022

点此查看论文截图

Fine-grained Few-shot Recognition by Deep Object Parsing

Authors:Ruizhao Zhu, Pengkai Zhu, Samarth Mishra, Venkatesh Saligrama

We propose a new method for fine-grained few-shot recognition via deep object parsing.In our framework, an object is made up of K distinct parts and for each part, we learn a dictionary of templates, which is shared across all instances and categories. An object is parsed by estimating the locations of these K parts and a set of active templates that can reconstruct the part features. We recognize test instances by comparing its active templates and the relative geometry of its part locations against those of the presented few-shot instances. Our method is end-to-end trainable to learn part templates on-top of a convolutional backbone. To combat visual distortions such as orientation, pose and size, we learn templates at multiple scales, and at test-time parse and match instances across these scales. We show that our method is competitive with the state-of-the-art, and by virtue of parsing enjoys interpretability as well.
PDF

点此查看论文截图

Boosting Few-shot Fine-grained Recognition with Background Suppression and Foreground Alignment

Authors:Zican Zha, Hao Tang, Yunlian Sun, Jinhui Tang

Few-shot fine-grained recognition (FS-FGR) aims to recognize novel fine-grained categories with the help of limited available samples. Undoubtedly, this task inherits the main challenges from both few-shot learning and fine-grained recognition. First, the lack of labeled samples makes the learned model easy to overfit. Second, it also suffers from high intra-class variance and low inter-class difference in the datasets. To address this challenging task, we propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local to local (L2L) similarity metric. Specifically, the BAS is introduced to generate a foreground mask for localization to weaken background disturbance and enhance dominative foreground objects. What’s more, considering the lack of labeled samples, we compute the pairwise similarity of feature maps using both the raw image and the refined image. The FOA then reconstructs the feature map of each support sample according to its correction to the query ones, which addresses the problem of misalignment between support-query image pairs. To enable the proposed method to have the ability to capture subtle differences in confused samples, we present a novel L2L similarity metric to further measure the local similarity between a pair of aligned spatial features in the embedding space. Extensive experiments conducted on multiple popular fine-grained benchmarks demonstrate that our method outperforms the existing state-of-the-art by a large margin.
PDF Preprint under review in TCSVT Journal

点此查看论文截图

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

Authors:Chunyuan Li, Haotian Liu, Liunian Harold Li, Pengchuan Zhang, Jyoti Aneja, Jianwei Yang, Ping Jin, Houdong Hu, Zicheng Liu, Yong Jae Lee, Jianfeng Gao

Learning visual representations from natural language supervision has recently shown great promise in a number of pioneering works. In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks. However, it remains challenging to evaluate the transferablity of these models due to the lack of easy-to-use evaluation toolkits and public benchmarks. To tackle this, we build ELEVATER (Evaluation of Language-augmented Visual Task-level Transfer), the first benchmark and toolkit for evaluating(pre-trained) language-augmented visual models. ELEVATER is composed of three components. (i) Datasets. As downstream evaluation suites, it consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge. (ii) Toolkit. An automatic hyper-parameter tuning toolkit is developed to facilitate model evaluation on downstream tasks. (iii) Metrics. A variety of evaluation metrics are used to measure sample-efficiency (zero-shot and few-shot) and parameter-efficiency (linear probing and full model fine-tuning). ELEVATER is a platform for Computer Vision in the Wild (CVinW), and is publicly released at at https://computer-vision-in-the-wild.github.io/ELEVATER/
PDF NeurIPS 2022 (Datasets and Benchmarks Track). The first two authors contribute equally. Benchmark page: https://computer-vision-in-the-wild.github.io/ELEVATER/

点此查看论文截图

ThinkSum: Probabilistic reasoning over sets using large language models

Authors:Batu Ozturkler, Nikolay Malkin, Zhen Wang, Nebojsa Jojic

Large language models (LLMs) have a substantial capacity for high-level analogical reasoning: reproducing patterns in linear text that occur in their training data (zero-shot evaluation) or in the provided context (few-shot in-context learning). However, recent studies show that even the largest LLMs fail in scenarios that require reasoning over multiple objects or facts or making sequences of logical deductions. We propose a two-stage probabilistic inference paradigm, ThinkSum, that reasons over sets of objects or facts in a structured manner. In the first stage (Think — ‘fast’ retrieval of associations), a LLM is queried in parallel over a set of phrases extracted from the prompt or an auxiliary model call. In the second stage (Sum — ‘slow’ probabilistic inference or reasoning), the results of these queries are aggregated to make the final prediction. We demonstrate the advantages of ThinkSum on the BIG-bench suite of evaluation tasks, achieving improvements over the state of the art using GPT-family models on ten difficult tasks, often with far smaller model variants. We compare and contrast ThinkSum with other proposed modifications to direct prompting of LLMs, such as variants of chain-of-thought prompting. We argue that because the probabilistic inference in ThinkSum is performed outside of calls to the LLM, ThinkSum is less sensitive to prompt design, yields more interpretable predictions, and can be flexibly combined with latent variable models to extract structured knowledge from LLMs.
PDF

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录