2022-09-21 更新

On the Relation between Sensitivity and Accuracy in In-context Learning

Authors:Yanda Chen, Chen Zhao, Zhou Yu, Kathleen McKeown, He He

In-context learning (ICL) suffers from oversensitivity to the prompt, which makes it unreliable in real-world scenarios. We study the sensitivity of ICL with respect to multiple types of perturbations. First, we find that label bias obscures true ICL sensitivity, and hence prior work may have significantly underestimated the true ICL sensitivity. Second, we observe a strong negative correlation between ICL sensitivity and accuracy, with sensitive predictions less likely to be correct. Motivated by these observations, we propose \textsc{SenSel}, a few-shot selective prediction method based on ICL sensitivity. Experiments on ten classification benchmarks show that \textsc{SenSel} consistently outperforms a commonly used confidence-based selective prediction baseline.


Robust Prototypical Few-Shot Organ Segmentation with Regularized Neural-ODEs

Authors:Prashant Pandey, Mustafa Chasmai, Tanuj Sur, Brejesh Lall

Despite the tremendous progress made by deep learning models in image semantic segmentation, they typically require large annotated examples, and increasing attention is being diverted to problem settings like Few-Shot Learning (FSL) where only a small amount of annotation is needed for generalisation to novel classes. This is especially seen in medical domains where dense pixel-level annotations are expensive to obtain. In this paper, we propose Regularized Prototypical Neural Ordinary Differential Equation (R-PNODE), a method that leverages intrinsic properties of Neural-ODEs, assisted and enhanced by additional cluster and consistency losses to perform Few-Shot Segmentation (FSS) of organs. R-PNODE constrains support and query features from the same classes to lie closer in the representation space thereby improving the performance over the existing Convolutional Neural Network (CNN) based FSS methods. We further demonstrate that while many existing Deep CNN based methods tend to be extremely vulnerable to adversarial attacks, R-PNODE exhibits increased adversarial robustness for a wide array of these attacks. We experiment with three publicly available multi-organ segmentation datasets in both in-domain and cross-domain FSS settings to demonstrate the efficacy of our method. In addition, we perform experiments with seven commonly used adversarial attacks in various settings to demonstrate R-PNODE’s robustness. R-PNODE outperforms the baselines for FSS by significant margins and also shows superior performance for a wide array of attacks varying in intensity and design.


Adaptive Dimension Reduction and Variational Inference for Transductive Few-Shot Classification

Authors:Yuqing Hu, Stéphane Pateux, Vincent Gripon

Transductive Few-Shot learning has gained increased attention nowadays considering the cost of data annotations along with the increased accuracy provided by unlabelled samples in the domain of few shot. Especially in Few-Shot Classification (FSC), recent works explore the feature distributions aiming at maximizing likelihoods or posteriors with respect to the unknown parameters. Following this vein, and considering the parallel between FSC and clustering, we seek for better taking into account the uncertainty in estimation due to lack of data, as well as better statistical properties of the clusters associated with each class. Therefore in this paper we propose a new clustering method based on Variational Bayesian inference, further improved by Adaptive Dimension Reduction based on Probabilistic Linear Discriminant Analysis. Our proposed method significantly improves accuracy in the realistic unbalanced transductive setting on various Few-Shot benchmarks when applied to features used in previous studies, with a gain of up to $6\%$ in accuracy. In addition, when applied to balanced setting, we obtain very competitive results without making use of the class-balance artefact which is disputable for practical use cases. We also provide the performance of our method on a high performing pretrained backbone, with the reported results further surpassing the current state-of-the-art accuracy, suggesting the genericity of the proposed method.


Hierarchical Relational Learning for Few-Shot Knowledge Graph Completion

Authors:Han Wu, Jie Yin, Bala Rajaratnam, Jianyuan Guo

Knowledge graphs (KGs) are known for their large scale and knowledge inference ability, but are also notorious for the incompleteness associated with them. Due to the long-tail distribution of the relations in KGs, few-shot KG completion has been proposed as a solution to alleviate incompleteness and expand the coverage of KGs. It aims to make predictions for triplets involving novel relations when only a few training triplets are provided as reference. Previous methods have mostly focused on designing local neighbor aggregators to learn entity-level information and/or imposing sequential dependency assumption at the triplet level to learn meta relation information. However, valuable pairwise triplet-level interactions and context-level relational information have been largely overlooked for learning meta representations of few-shot relations. In this paper, we propose a hierarchical relational learning method (HiRe) for few-shot KG completion. By jointly capturing three levels of relational information (entity-level, triplet-level and context-level), HiRe can effectively learn and refine the meta representation of few-shot relations, and consequently generalize very well to new unseen relations. Extensive experiments on two benchmark datasets validate the superiority of HiRe against other state-of-the-art methods.
PDF 10 pages, 5 figures


Selective Token Generation for Few-shot Natural Language Generation

Authors:Daejin Jo, Taehwan Kwon, Eun-Sol Kim, Sungwoong Kim

Natural language modeling with limited training data is a challenging problem, and many algorithms make use of large-scale pretrained language models (PLMs) for this due to its great generalization ability. Among them, additive learning that incorporates a task-specific adapter on top of the fixed large-scale PLM has been popularly used in the few-shot setting. However, this added adapter is still easy to disregard the knowledge of the PLM especially for few-shot natural language generation (NLG) since an entire sequence is usually generated by only the newly trained adapter. Therefore, in this work, we develop a novel additive learning algorithm based on reinforcement learning (RL) that selectively outputs language tokens between the task-general PLM and the task-specific adapter during both training and inference. This output token selection over the two generators allows the adapter to take into account solely the task-relevant parts in sequence generation, and therefore makes it more robust to overfitting as well as more stable in RL training. In addition, to obtain the complementary adapter from the PLM for each few-shot task, we exploit a separate selecting module that is also simultaneously trained using RL. Experimental results on various few-shot NLG tasks including question answering, data-to-text generation and text summarization demonstrate that the proposed selective token generation significantly outperforms the previous additive learning algorithms based on the PLMs.


Knowledge Is Flat: A Seq2Seq Generative Framework for Various Knowledge Graph Completion

Authors:Chen Chen, Yufei Wang, Bing Li, Kwok-Yan Lam

Knowledge Graph Completion (KGC) has been recently extended to multiple knowledge graph (KG) structures, initiating new research directions, e.g. static KGC, temporal KGC and few-shot KGC. Previous works often design KGC models closely coupled with specific graph structures, which inevitably results in two drawbacks: 1) structure-specific KGC models are mutually incompatible; 2) existing KGC methods are not adaptable to emerging KGs. In this paper, we propose KG-S2S, a Seq2Seq generative framework that could tackle different verbalizable graph structures by unifying the representation of KG facts into “flat” text, regardless of their original form. To remedy the KG structure information loss from the “flat” text, we further improve the input representations of entities and relations, and the inference algorithm in KG-S2S. Experiments on five benchmarks show that KG-S2S outperforms many competitive baselines, setting new state-of-the-art performance. Finally, we analyze KG-S2S’s ability on the different relations and the Non-entity Generations.
PDF COLING 2022 Main Conference


Automatic Label Sequence Generation for Prompting Sequence-to-sequence Models

Authors:Zichun Yu, Tianyu Gao, Zhengyan Zhang, Yankai Lin, Zhiyuan Liu, Maosong Sun, Jie Zhou

Prompting, which casts downstream applications as language modeling tasks, has shown to be sample efficient compared to standard fine-tuning with pre-trained models. However, one pitfall of prompting is the need of manually-designed patterns, whose outcome can be unintuitive and requires large validation sets to tune. To tackle the challenge, we propose AutoSeq, a fully automatic prompting method: (1) We adopt natural language prompts on sequence-to-sequence models, enabling free-form generation and larger label search space; (2) We propose label sequences — phrases with indefinite lengths to verbalize the labels — which eliminate the need of manual templates and are more expressive than single label words; (3) We use beam search to automatically generate a large amount of label sequence candidates and propose contrastive re-ranking to get the best combinations. AutoSeq significantly outperforms other no-manual-design methods, such as soft prompt tuning, adapter tuning, and automatic search on single label words; the generated label sequences are even better than curated manual ones on a variety of tasks. Our method reveals the potential of sequence-to-sequence models in few-shot learning and sheds light on a path to generic and automatic prompting. The source code of this paper can be obtained from https://github.com/thunlp/Seq2Seq-Prompt.
PDF Accepted to COLING 2022


Few-Shot Non-Parametric Learning with Deep Latent Variable Model

Authors:Zhiying Jiang, Yiqin Dai, Ji Xin, Ming Li, Jimmy Lin

Most real-world problems that machine learning algorithms are expected to solve face the situation with 1) unknown data distribution; 2) little domain-specific knowledge; and 3) datasets with limited annotation. We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV), a learning framework for any dataset with abundant unlabeled data but very few labeled ones. By only training a generative model in an unsupervised way, the framework utilizes the data distribution to build a compressor. Using a compressor-based distance metric derived from Kolmogorov complexity, together with few labeled data, NPC-LV classifies without further training. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime and even outperform semi-supervised learning methods on CIFAR-10. We demonstrate how and when negative evidence lowerbound (nELBO) can be used as an approximate compressed length for classification. By revealing the correlation between compression rate and classification accuracy, we illustrate that under NPC-LV, the improvement of generative models can enhance downstream classification accuracy.
PDF Accepted to NeurIPS2022


Smoothed Embeddings for Certified Few-Shot Learning

Authors:Mikhail Pautov, Olesya Kuznetsova, Nurislam Tursynbek, Aleksandr Petiushko, Ivan Oseledets

Randomized smoothing is considered to be the state-of-the-art provable defense against adversarial perturbations. However, it heavily exploits the fact that classifiers map input objects to class probabilities and do not focus on the ones that learn a metric space in which classification is performed by computing distances to embeddings of classes prototypes. In this work, we extend randomized smoothing to few-shot learning models that map inputs to normalized embeddings. We provide analysis of Lipschitz continuity of such models and derive robustness certificate against $\ell_2$-bounded perturbations that may be useful in few-shot learning scenarios. Our theoretical results are confirmed by experiments on different datasets.


Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

Authors:Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, Ashwin Kalyan

When answering a question, humans utilize the information available across different modalities to synthesize a consistent and complete chain of thought (CoT). This process is normally a black box in the case of deep learning models like large-scale language models. Recently, science question benchmarks have been used to diagnose the multi-hop reasoning ability and interpretability of an AI system. However, existing datasets fail to provide annotations for the answers, or are restricted to the textual-only modality, small scales, and limited domain diversity. To this end, we present Science Question Answering (SQA), a new benchmark that consists of ~21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations. We further design language models to learn to generate lectures and explanations as the chain of thought (CoT) to mimic the multi-hop reasoning process when answering SQA questions. SQA demonstrates the utility of CoT in language models, as CoT improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA. We also explore the upper bound for models to leverage explanations by feeding those in the input; we observe that it improves the few-shot performance of GPT-3 by 18.96%. Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data.
PDF Accepted to NeurIPS 2022. 21 pages, 17 figures, 9 tables


Few-Shot Classification with Contrastive Learning

Authors:Zhanyuan Yang, Jinghua Wang, Yingying Zhu

A two-stage training paradigm consisting of sequential pre-training and meta-training stages has been widely used in current few-shot learning (FSL) research. Many of these methods use self-supervised learning and contrastive learning to achieve new state-of-the-art results. However, the potential of contrastive learning in both stages of FSL training paradigm is still not fully exploited. In this paper, we propose a novel contrastive learning-based framework that seamlessly integrates contrastive learning into both stages to improve the performance of few-shot classification. In the pre-training stage, we propose a self-supervised contrastive loss in the forms of feature vector vs. feature map and feature map vs. feature map, which uses global and local information to learn good initial representations. In the meta-training stage, we propose a cross-view episodic training mechanism to perform the nearest centroid classification on two different views of the same episode and adopt a distance-scaled contrastive loss based on them. These two strategies force the model to overcome the bias between views and promote the transferability of representations. Extensive experiments on three benchmark datasets demonstrate that our method achieves competitive results.
PDF To appear in ECCV 2022


On the Soft-Subnetwork for Few-shot Class Incremental Learning

Authors:Haeyong Kang, Jaehong Yoon, Sultan Rizky Hikmawan Madjid, Sung Ju Hwang, Chang D. Yoo

Inspired by Regularized Lottery Ticket Hypothesis (RLTH), which hypothesizes that there exist smooth (non-binary) subnetworks within a dense network that achieve the competitive performance of the dense network, we propose a few-shot class incremental learning (FSCIL) method referred to as \emph{Soft-SubNetworks (SoftNet)}. Our objective is to learn a sequence of sessions incrementally, where each session only includes a few training instances per class while preserving the knowledge of the previously learned ones. SoftNet jointly learns the model weights and adaptive non-binary soft masks at a base training session in which each mask consists of the major and minor subnetwork; the former aims to minimize catastrophic forgetting during training, and the latter aims to avoid overfitting to a few samples in each new training session. We provide comprehensive empirical validations demonstrating that our SoftNet effectively tackles the few-shot incremental learning problem by surpassing the performance of state-of-the-art baselines over benchmark datasets.


文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !