无监督/半监督/对比学习


2022-10-05 更新

CADet: Fully Self-Supervised Anomaly Detection With Contrastive Learning

Authors:Charles Guille-Escuret, Pau Rodriguez, David Vazquez, Ioannis Mitliagkas, Joao Monteiro

Handling out-of-distribution (OOD) samples has become a major stake in the real-world deployment of machine learning systems. This work explores the application of self-supervised contrastive learning to the simultaneous detection of two types of OOD samples: unseen classes and adversarial perturbations. Since in practice the distribution of such samples is not known in advance, we do not assume access to OOD examples. We show that similarity functions trained with contrastive learning can be leveraged with the maximum mean discrepancy (MMD) two-sample test to verify whether two independent sets of samples are drawn from the same distribution. Inspired by this approach, we introduce CADet (Contrastive Anomaly Detection), a method based on image augmentations to perform anomaly detection on single samples. CADet compares favorably to adversarial detection methods to detect adversarially perturbed samples on ImageNet. Simultaneously, it achieves comparable performance to unseen label detection methods on two challenging benchmarks: ImageNet-O and iNaturalist. CADet is fully self-supervised and requires neither labels for in-distribution samples nor access to OOD examples.
PDF

点此查看论文截图

Contrastive Corpus Attribution for Explaining Representations

Authors:Chris Lin, Hugh Chen, Chanwoo Kim, Su-In Lee

Despite the widespread use of unsupervised models, very few methods are designed to explain them. Most explanation methods explain a scalar model output. However, unsupervised models output representation vectors, the elements of which are not good candidates to explain because they lack semantic meaning. To bridge this gap, recent works defined a scalar explanation output: a dot product-based similarity in the representation space to the sample being explained (i.e., an explicand). Although this enabled explanations of unsupervised models, the interpretation of this approach can still be opaque because similarity to the explicand’s representation may not be meaningful to humans. To address this, we propose contrastive corpus similarity, a novel and semantically meaningful scalar explanation output based on a reference corpus and a contrasting foil set of samples. We demonstrate that contrastive corpus similarity is compatible with many post-hoc feature attribution methods to generate COntrastive COrpus Attributions (COCOA) and quantitatively verify that features important to the corpus are identified. We showcase the utility of COCOA in two ways: (i) we draw insights by explaining augmentations of the same image in a contrastive learning setting (SimCLR); and (ii) we perform zero-shot object localization by explaining the similarity of image representations to jointly learned text representations (CLIP).
PDF

点此查看论文截图

2022-10-05 更新

More Than Just Attention: Improving Cross-Modal Attentions with Contrastive Constraints for Image-Text Matching

Authors:Yuxiao Chen, Jianbo Yuan, Long Zhao, Tianlang Chen, Rui Luo, Larry Davis, Dimitris N. Metaxas

Cross-modal attention mechanisms have been widely applied to the image-text matching task and have achieved remarkable improvements thanks to its capability of learning fine-grained relevance across different modalities. However, the cross-modal attention models of existing methods could be sub-optimal and inaccurate because there is no direct supervision provided during the training process. In this work, we propose two novel training strategies, namely Contrastive Content Re-sourcing (CCR) and Contrastive Content Swapping (CCS) constraints, to address such limitations. These constraints supervise the training of cross-modal attention models in a contrastive learning manner without requiring explicit attention annotations. They are plug-in training strategies and can be easily integrated into existing cross-modal attention models. Additionally, we introduce three metrics including Attention Precision, Recall, and F1-Score to quantitatively measure the quality of learned attention models. We evaluate the proposed constraints by incorporating them into four state-of-the-art cross-modal attention-based image-text matching models. Experimental results on both Flickr30k and MS-COCO datasets demonstrate that integrating these constraints improves the model performance in terms of both retrieval performance and attention metrics.
PDF Accepted to WACV 2023

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录