2022-09-29 更新
MetaPrompting: Learning to Learn Better Prompts
Authors:Yutai Hou, Hongyuan Dong, Xinghao Wang, Bohan Li, Wanxiang Che
Prompting method is regarded as one of the crucial progress for few-shot nature language processing. Recent research on prompting moves from discrete tokens based hard prompts'' to continuous
soft prompts’’, which employ learnable vectors as pseudo prompt tokens and achieve better performance. Though showing promising prospects, these soft-prompting methods are observed to rely heavily on good initialization to take effect. Unfortunately, obtaining a perfect initialization for soft prompts requires understanding of inner language models working and elaborate design, which is no easy task and has to restart from scratch for each new task. To remedy this, we propose a generalized soft prompting method called MetaPrompting, which adopts the well-recognized model-agnostic meta-learning algorithm to automatically find better prompt initialization that facilitates fast adaptation to new prompting tasks.Extensive experiments show MetaPrompting tackles soft prompt initialization problem and brings significant improvement on four different datasets (over 6 points improvement in accuracy for 1-shot setting), achieving new state-of-the-art performance.
PDF
点此查看论文截图
CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention
Authors:Ziyu Guo, Renrui Zhang, Longtian Qiu, Xianzheng Ma, Xupeng Miao, Xuming He, Bin Cui
Contrastive Language-Image Pre-training (CLIP) has been shown to learn visual representations with great transferability, which achieves promising accuracy for zero-shot classification. To further improve its downstream performance, existing works propose additional learnable modules upon CLIP and fine-tune them by few-shot training sets. However, the resulting extra training cost and data requirement severely hinder the efficiency for model deployment and knowledge transfer. In this paper, we introduce a free-lunch enhancement method, CALIP, to boost CLIP’s zero-shot performance via a parameter-free Attention module. Specifically, we guide visual and textual representations to interact with each other and explore cross-modal informative features via attention. As the pre-training has largely reduced the embedding distances between two modalities, we discard all learnable parameters in the attention and bidirectionally update the multi-modal features, enabling the whole process to be parameter-free and training-free. In this way, the images are blended with textual-aware signals and the text representations become visual-guided for better adaptive zero-shot alignment. We evaluate CALIP on various benchmarks of 14 datasets for both 2D image and 3D point cloud few-shot classification, showing consistent zero-shot performance improvement over CLIP. Based on that, we further insert a small number of linear layers in CALIP’s attention module and verify our robustness under the few-shot settings, which also achieves leading performance compared to existing methods. Those extensive experiments demonstrate the superiority of our approach for efficient enhancement of CLIP.
PDF 12 pages, 6 figures
点此查看论文截图
Learning Relation-Specific Representations for Few-shot Knowledge Graph Completion
Authors:Yuling Li, Kui Yu, Yuhong Zhang, Xindong Wu
Recent years have witnessed increasing interest in few-shot knowledge graph completion (FKGC), which aims to infer unseen query triples for a few-shot relation using a few reference triples about the relation. The primary focus of existing FKGC methods lies in learning relation representations that can reflect the common information shared by the query and reference triples. To this end, these methods learn entity-pair representations from the direct neighbors of head and tail entities, and then aggregate the representations of reference entity pairs. However, the entity-pair representations learned only from direct neighbors may have low expressiveness when the involved entities have sparse direct neighbors or share a common local neighborhood with other entities. Moreover, merely modeling the semantic information of head and tail entities is insufficient to accurately infer their relational information especially when they have multiple relations. To address these issues, we propose a Relation-Specific Context Learning (RSCL) framework, which exploits graph contexts of triples to learn global and local relation-specific representations for few-shot relations. Specifically, we first extract graph contexts for each triple, which can provide long-term entity-relation dependencies. To encode the extracted graph contexts, we then present a hierarchical attention network to capture contextualized information of triples and highlight valuable local neighborhood information of entities. Finally, we design a hybrid attention aggregator to evaluate the likelihood of the query triples at the global and local levels. Experimental results on two public datasets demonstrate that RSCL outperforms state-of-the-art FKGC methods.
PDF 12 pages, 6 figures
点此查看论文截图
Revisiting Few-Shot Learning from a Causal Perspective
Authors:Guoliang Lin, Hanjiang Lai
Few-shot learning with N-way K-shot scheme is an open challenge in machine learning. Many approaches have been proposed to tackle this problem, e.g., the Matching Networks and CLIP-Adapter. Despite that these approaches have shown significant progress, the mechanism of why these methods succeed has not been well explored. In this paper, we interpret these few-shot learning methods via causal mechanism. We show that the existing approaches can be viewed as specific forms of front-door adjustment, which is to remove the effects of confounders. Based on this, we introduce a general causal method for few-shot learning, which considers not only the relationship between examples but also the diversity of representations. Experimental results demonstrate the superiority of our proposed method in few-shot classification on various benchmark datasets. Code is available in the supplementary material.
PDF
点此查看论文截图
Rethinking Clustering-Based Pseudo-Labeling for Unsupervised Meta-Learning
Authors:Xingping Dong, Jianbing Shen, Ling Shao
The pioneering method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling. This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data. However, it often suffers from label inconsistency or limited diversity, which leads to poor performance. In this work, we prove that the core reason for this is lack of a clustering-friendly property in the embedding space. We address this by minimizing the inter- to intra-class similarity ratio to provide clustering-friendly embedding features, and validate our approach through comprehensive experiments. Note that, despite only utilizing a simple clustering algorithm (k-means) in our embedding space to obtain the pseudo-labels, we achieve significant improvement. Moreover, we adopt a progressive evaluation mechanism to obtain more diverse samples in order to further alleviate the limited diversity problem. Finally, our approach is also model-agnostic and can easily be integrated into existing supervised methods. To demonstrate its generalization ability, we integrate it into two representative algorithms: MAML and EP. The results on three main few-shot benchmarks clearly show that the proposed method achieves significant improvement compared to state-of-the-art models. Notably, our approach also outperforms the corresponding supervised method in two tasks.
PDF Accepted by European Conference on Computer Vision (ECCV), 2022
点此查看论文截图
An Embarrassingly Simple Approach to Semi-Supervised Few-Shot Learning
Authors:Xiu-Shen Wei, He-Yang Xu, Faen Zhang, Yuxin Peng, Wei Zhou
Semi-supervised few-shot learning consists in training a classifier to adapt to new tasks with limited labeled data and a fixed quantity of unlabeled data. Many sophisticated methods have been developed to address the challenges this problem comprises. In this paper, we propose a simple but quite effective approach to predict accurate negative pseudo-labels of unlabeled data from an indirect learning perspective, and then augment the extremely label-constrained support set in few-shot classification tasks. Our approach can be implemented in just few lines of code by only using off-the-shelf operations, yet it is able to outperform state-of-the-art methods on four benchmark datasets.
PDF Accepted by NeurIPS 2022
点此查看论文截图
WeLM: A Well-Read Pre-trained Language Model for Chinese
Authors:Hui Su, Xiao Zhou, Houjin Yu, Yuwen Chen, Zilin Zhu, Yang Yu, Jie Zhou
Large Language Models pre-trained with self-supervised learning have demonstrated impressive zero-shot generalization capabilities on a wide spectrum of tasks. In this work, we present WeLM: a well-read pre-trained language model for Chinese that is able to seamlessly perform different types of tasks with zero or few-shot demonstrations. WeLM is trained with 10B parameters by “reading” a curated high-quality corpus covering a wide range of topics. We show that WeLM is equipped with broad knowledge on various domains and languages. On 18 monolingual (Chinese) tasks, WeLM can significantly outperform existing pre-trained models with similar sizes and match the performance of models up to 25 times larger. WeLM also exhibits strong capabilities in multi-lingual and code-switching understanding, outperforming existing multilingual language models pre-trained on 30 languages. Furthermore, We collected human-written prompts for a large set of supervised datasets in Chinese and fine-tuned WeLM with multi-prompted training. The resulting model can attain strong generalization on unseen types of tasks and outperform the unsupervised WeLM in zero-shot learning. Finally, we demonstrate that WeLM has basic skills at explaining and calibrating the decisions from itself, which can be promising directions for future research. Our models can be applied from https://welm.weixin.qq.com/docs/api/.
PDF