2023-06-27 更新
Devil is in Channels: Contrastive Single Domain Generalization for Medical Image Segmentation
Authors:Shishuai Hu, Zehui Liao, Yong Xia
Deep learning-based medical image segmentation models suffer from performance degradation when deployed to a new healthcare center. To address this issue, unsupervised domain adaptation and multi-source domain generalization methods have been proposed, which, however, are less favorable for clinical practice due to the cost of acquiring target-domain data and the privacy concerns associated with redistributing the data from multiple source domains. In this paper, we propose a \textbf{C}hannel-level \textbf{C}ontrastive \textbf{S}ingle \textbf{D}omain \textbf{G}eneralization (\textbf{C$^2$SDG}) model for medical image segmentation. In C$^2$SDG, the shallower features of each image and its style-augmented counterpart are extracted and used for contrastive training, resulting in the disentangled style representations and structure representations. The segmentation is performed based solely on the structure representations. Our method is novel in the contrastive perspective that enables channel-wise feature disentanglement using a single source domain. We evaluated C$^2$SDG against six SDG methods on a multi-domain joint optic cup and optic disc segmentation benchmark. Our results suggest the effectiveness of each module in C$^2$SDG and also indicate that C$^2$SDG outperforms the baseline and all competing methods with a large margin. The code will be available at \url{https://github.com/ShishuaiHu/CCSDG}.
PDF MICCAI 2023 Early Accept (Top 14%), 12 pages, 5 figures
点此查看论文截图
Inter-Instance Similarity Modeling for Contrastive Learning
Authors:Chengchao Shen, Dawei Liu, Hao Tang, Zhe Qu, Jianxin Wang
The existing contrastive learning methods widely adopt one-hot instance discrimination as pretext task for self-supervised learning, which inevitably neglects rich inter-instance similarities among natural images, then leading to potential representation degeneration. In this paper, we propose a novel image mix method, PatchMix, for contrastive learning in Vision Transformer (ViT), to model inter-instance similarities among images. Following the nature of ViT, we randomly mix multiple images from mini-batch in patch level to construct mixed image patch sequences for ViT. Compared to the existing sample mix methods, our PatchMix can flexibly and efficiently mix more than two images and simulate more complicated similarity relations among natural images. In this manner, our contrastive framework can significantly reduce the gap between contrastive objective and ground truth in reality. Experimental results demonstrate that our proposed method significantly outperforms the previous state-of-the-art on both ImageNet-1K and CIFAR datasets, e.g., 3.0% linear accuracy improvement on ImageNet-1K and 8.7% kNN accuracy improvement on CIFAR100. Moreover, our method achieves the leading transfer performance on downstream tasks, object detection and instance segmentation on COCO dataset. The code is available at https://github.com/visresearch/patchmix
PDF
点此查看论文截图
Structuring Representation Geometry with Rotationally Equivariant Contrastive Learning
Authors:Sharut Gupta, Joshua Robinson, Derek Lim, Soledad Villar, Stefanie Jegelka
Self-supervised learning converts raw perceptual data such as images to a compact space where simple Euclidean distances measure meaningful variations in data. In this paper, we extend this formulation by adding additional geometric structure to the embedding space by enforcing transformations of input space to correspond to simple (i.e., linear) transformations of embedding space. Specifically, in the contrastive learning setting, we introduce an equivariance objective and theoretically prove that its minima forces augmentations on input space to correspond to rotations on the spherical embedding space. We show that merely combining our equivariant loss with a non-collapse term results in non-trivial representations, without requiring invariance to data augmentations. Optimal performance is achieved by also encouraging approximate invariance, where input augmentations correspond to small rotations. Our method, CARE: Contrastive Augmentation-induced Rotational Equivariance, leads to improved performance on downstream tasks, and ensures sensitivity in embedding space to important variations in data (e.g., color) that standard contrastive methods do not achieve. Code is available at https://github.com/Sharut/CARE.
PDF 22 pages
点此查看论文截图
Scribble-supervised Cell Segmentation Using Multiscale Contrastive Regularization
Authors:Hyun-Jic Oh, Kanggeun Lee, Won-Ki Jeong
Current state-of-the-art supervised deep learning-based segmentation approaches have demonstrated superior performance in medical image segmentation tasks. However, such supervised approaches require fully annotated pixel-level ground-truth labels, which are labor-intensive and time-consuming to acquire. Recently, Scribble2Label (S2L) demonstrated that using only a handful of scribbles with self-supervised learning can generate accurate segmentation results without full annotation. However, owing to the relatively small size of scribbles, the model is prone to overfit and the results may be biased to the selection of scribbles. In this work, we address this issue by employing a novel multiscale contrastive regularization term for S2L. The main idea is to extract features from intermediate layers of the neural network for contrastive loss so that structures at various scales can be effectively separated. To verify the efficacy of our method, we conducted ablation studies on well-known datasets, such as Data Science Bowl 2018 and MoNuSeg. The results show that the proposed multiscale contrastive loss is effective in improving the performance of S2L, which is comparable to that of the supervised learning segmentation method.
PDF ISBI 2022 accepted
点此查看论文截图
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards
Authors:Yangjun Mao, Jun Xiao, Dong Zhang, Meng Cao, Jian Shao, Yueting Zhuang, Long Chen
Distinctive Image Captioning (DIC) — generating distinctive captions that describe the unique details of a target image — has received considerable attention over the last few years. A recent DIC method proposes to generate distinctive captions by comparing the target image with a set of semantic-similar reference images, i.e., reference-based DIC (Ref-DIC). It aims to force the generated captions to distinguish between the target image and the reference image. To ensure Ref-DIC models really perceive the unique objects (or attributes) in target images, we propose two new Ref-DIC benchmarks and develop a Transformer-based Ref-DIC baseline TransDIC. The model only extracts visual features from the target image, but also encodes the differences between objects in the target and reference images. Taking one step further, we propose a stronger TransDIC++, which consists of an extra contrastive learning module to make full use of the reference images. This new module is model-agnostic, which can be easily incorporated into various Ref-DIC architectures. Finally, for more trustworthy benchmarking, we propose a new evaluation metric named DisCIDEr for Ref-DIC, which evaluates both the accuracy and distinctiveness of the generated captions. Experimental results demonstrate that our TransDIC++ can generate distinctive captions. Besides, it outperforms several state-of-the-art models on the two new benchmarks over different metrics.
PDF arXiv admin note: substantial text overlap with arXiv:2207.11118
点此查看论文截图
Multi-Scale Cross Contrastive Learning for Semi-Supervised Medical Image Segmentation
Authors:Qianying Liu, Xiao Gu, Paul Henderson, Fani Deligianni
Semi-supervised learning has demonstrated great potential in medical image segmentation by utilizing knowledge from unlabeled data. However, most existing approaches do not explicitly capture high-level semantic relations between distant regions, which limits their performance. In this paper, we focus on representation learning for semi-supervised learning, by developing a novel Multi-Scale Cross Supervised Contrastive Learning (MCSC) framework, to segment structures in medical images. We jointly train CNN and Transformer models, regularising their features to be semantically consistent across different scales. Our approach contrasts multi-scale features based on ground-truth and cross-predicted labels, in order to extract robust feature representations that reflect intra- and inter-slice relationships across the whole dataset. To tackle class imbalance, we take into account the prevalence of each class to guide contrastive learning and ensure that features adequately capture infrequent classes. Extensive experiments on two multi-structure medical segmentation datasets demonstrate the effectiveness of MCSC. It not only outperforms state-of-the-art semi-supervised methods by more than 3.0% in Dice, but also greatly reduces the performance gap with fully supervised methods.
PDF
点此查看论文截图
Histopathology Image Classification using Deep Manifold Contrastive Learning
Authors:Jing Wei Tan, Won-Ki Jeong
Contrastive learning has gained popularity due to its robustness with good feature representation performance. However, cosine distance, the commonly used similarity metric in contrastive learning, is not well suited to represent the distance between two data points, especially on a nonlinear feature manifold. Inspired by manifold learning, we propose a novel extension of contrastive learning that leverages geodesic distance between features as a similarity metric for histopathology whole slide image classification. To reduce the computational overhead in manifold learning, we propose geodesic-distance-based feature clustering for efficient contrastive loss evaluation using prototypes without time-consuming pairwise feature similarity comparison. The efficacy of the proposed method is evaluated on two real-world histopathology image datasets. Results demonstrate that our method outperforms state-of-the-art cosine-distance-based contrastive learning methods.
PDF