2023-07-01 更新
Inter-Instance Similarity Modeling for Contrastive Learning
Authors:Chengchao Shen, Dawei Liu, Hao Tang, Zhe Qu, Jianxin Wang
The existing contrastive learning methods widely adopt one-hot instance discrimination as pretext task for self-supervised learning, which inevitably neglects rich inter-instance similarities among natural images, then leading to potential representation degeneration. In this paper, we propose a novel image mix method, PatchMix, for contrastive learning in Vision Transformer (ViT), to model inter-instance similarities among images. Following the nature of ViT, we randomly mix multiple images from mini-batch in patch level to construct mixed image patch sequences for ViT. Compared to the existing sample mix methods, our PatchMix can flexibly and efficiently mix more than two images and simulate more complicated similarity relations among natural images. In this manner, our contrastive framework can significantly reduce the gap between contrastive objective and ground truth in reality. Experimental results demonstrate that our proposed method significantly outperforms the previous state-of-the-art on both ImageNet-1K and CIFAR datasets, e.g., 3.0% linear accuracy improvement on ImageNet-1K and 8.7% kNN accuracy improvement on CIFAR100. Moreover, our method achieves the leading transfer performance on downstream tasks, object detection and instance segmentation on COCO dataset. The code is available at https://github.com/visresearch/patchmix
PDF
点此查看论文截图
FBA-Net: Foreground and Background Aware Contrastive Learning for Semi-Supervised Atrium Segmentation
Authors:Yunsung Chung, Chanho Lim, Chao Huang, Nassir Marrouche, Jihun Hamm
Medical image segmentation of gadolinium enhancement magnetic resonance imaging (GE MRI) is an important task in clinical applications. However, manual annotation is time-consuming and requires specialized expertise. Semi-supervised segmentation methods that leverage both labeled and unlabeled data have shown promise, with contrastive learning emerging as a particularly effective approach. In this paper, we propose a contrastive learning strategy of foreground and background representations for semi-supervised 3D medical image segmentation (FBA-Net). Specifically, we leverage the contrastive loss to learn representations of both the foreground and background regions in the images. By training the network to distinguish between foreground-background pairs, we aim to learn a representation that can effectively capture the anatomical structures of interest. Experiments on three medical segmentation datasets demonstrate state-of-the-art performance. Notably, our method achieves a Dice score of 91.31% with only 20% labeled data, which is remarkably close to the 91.62% score of the fully supervised method that uses 100% labeled data on the left atrium dataset. Our framework has the potential to advance the field of semi-supervised 3D medical image segmentation and enable more efficient and accurate analysis of medical images with a limited amount of annotated labels.
PDF 11 pages, 2 figures
点此查看论文截图
Dental CLAIRES: Contrastive LAnguage Image REtrieval Search for Dental Research
Authors:Tanjida Kabir, Luyao Chen, Muhammad F Walji, Luca Giancardo, Xiaoqian Jiang, Shayan Shams
Learning about diagnostic features and related clinical information from dental radiographs is important for dental research. However, the lack of expert-annotated data and convenient search tools poses challenges. Our primary objective is to design a search tool that uses a user’s query for oral-related research. The proposed framework, Contrastive LAnguage Image REtrieval Search for dental research, Dental CLAIRES, utilizes periapical radiographs and associated clinical details such as periodontal diagnosis, demographic information to retrieve the best-matched images based on the text query. We applied a contrastive representation learning method to find images described by the user’s text by maximizing the similarity score of positive pairs (true pairs) and minimizing the score of negative pairs (random pairs). Our model achieved a hit@3 ratio of 96% and a Mean Reciprocal Rank (MRR) of 0.82. We also designed a graphical user interface that allows researchers to verify the model’s performance with interactions.
PDF 10 pages, 7 figures, 4 tables
点此查看论文截图
GraSS: Contrastive Learning with Gradient Guided Sampling Strategy for Remote Sensing Image Semantic Segmentation
Authors:Zhaoyang Zhang, Zhen Ren, Chao Tao, Yunsheng Zhang, Chengli Peng, Haifeng Li
Self-supervised contrastive learning (SSCL) has achieved significant milestones in remote sensing image (RSI) understanding. Its essence lies in designing an unsupervised instance discrimination pretext task to extract image features from a large number of unlabeled images that are beneficial for downstream tasks. However, existing instance discrimination based SSCL suffer from two limitations when applied to the RSI semantic segmentation task: 1) Positive sample confounding issue; 2) Feature adaptation bias. It introduces a feature adaptation bias when applied to semantic segmentation tasks that require pixel-level or object-level features. In this study, We observed that the discrimination information can be mapped to specific regions in RSI through the gradient of unsupervised contrastive loss, these specific regions tend to contain singular ground objects. Based on this, we propose contrastive learning with Gradient guided Sampling Strategy (GraSS) for RSI semantic segmentation. GraSS consists of two stages: Instance Discrimination warm-up (ID warm-up) and Gradient guided Sampling contrastive training (GS training). The ID warm-up aims to provide initial discrimination information to the contrastive loss gradients. The GS training stage aims to utilize the discrimination information contained in the contrastive loss gradients and adaptively select regions in RSI patches that contain more singular ground objects, in order to construct new positive and negative samples. Experimental results on three open datasets demonstrate that GraSS effectively enhances the performance of SSCL in high-resolution RSI semantic segmentation. Compared to seven baseline methods from five different types of SSCL, GraSS achieves an average improvement of 1.57\% and a maximum improvement of 3.58\% in terms of mean intersection over the union. The source code is available at https://github.com/GeoX-Lab/GraSS
PDF 12 pages, 9 figures
点此查看论文截图
Multi-network Contrastive Learning Based on Global and Local Representations
Authors:Weiquan Li, Xianzhong Long, Yun Li
The popularity of self-supervised learning has made it possible to train models without relying on labeled data, which saves expensive annotation costs. However, most existing self-supervised contrastive learning methods often overlook the combination of global and local feature information. This paper proposes a multi-network contrastive learning framework based on global and local representations. We introduce global and local feature information for self-supervised contrastive learning through multiple networks. The model learns feature information at different scales of an image by contrasting the embedding pairs generated by multiple networks. The framework also expands the number of samples used for contrast and improves the training efficiency of the model. Linear evaluation results on three benchmark datasets show that our method outperforms several existing classical self-supervised learning methods.
PDF
点此查看论文截图
Semantic Positive Pairs for Enhancing Contrastive Instance Discrimination
Authors:Mohammad Alkhalefi, Georgios Leontidis, Mingjun Zhong
Self-supervised learning algorithms based on instance discrimination effectively prevent representation collapse and produce promising results in representation learning. However, the process of attracting positive pairs (i.e., two views of the same instance) in the embedding space and repelling all other instances (i.e., negative pairs) irrespective of their categories could result in discarding important features. To address this issue, we propose an approach to identifying those images with similar semantic content and treating them as positive instances, named semantic positive pairs set (SPPS), thereby reducing the risk of discarding important features during representation learning. Our approach could work with any contrastive instance discrimination framework such as SimCLR or MOCO. We conduct experiments on three datasets: ImageNet, STL-10 and CIFAR-10 to evaluate our approach. The experimental results show that our approach consistently outperforms the baseline method vanilla SimCLR across all three datasets; for example, our approach improves upon vanilla SimCLR under linear evaluation protocol by 4.18% on ImageNet with a batch size 1024 and 800 epochs.
PDF 12 pages, 7 figures