无监督/半监督/对比学习


2022-11-05 更新

Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese

Authors:An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou

The tremendous success of CLIP (Radford et al., 2021) has promoted the research and application of contrastive learning for vision-language pretraining. In this work, we construct a large-scale dataset of image-text pairs in Chinese, where most data are retrieved from publicly available datasets, and we pretrain Chinese CLIP models on the new dataset. We develop 5 Chinese CLIP models of multiple sizes, spanning from 77 to 958 million parameters. Furthermore, we propose a two-stage pretraining method, where the model is first trained with the image encoder frozen and then trained with all parameters being optimized, to achieve enhanced model performance. Our comprehensive experiments demonstrate that Chinese CLIP can achieve the state-of-the-art performance on MUGE, Flickr30K-CN, and COCO-CN in the setups of zero-shot learning and finetuning, and it is able to achieve competitive performance in zero-shot image classification based on the evaluation on the ELEVATER benchmark (Li et al., 2022). We have released our codes, models, and demos in https://github.com/OFA-Sys/Chinese-CLIP
PDF

点此查看论文截图

Continual Contrastive Learning for Image Classification

Authors:Zhiwei Lin, Yongtao Wang, Hongxiang Lin

Recently, self-supervised representation learning gives further development in multimedia technology. Most existing self-supervised learning methods are applicable to packaged data. However, when it comes to streamed data, they are suffering from a catastrophic forgetting problem, which is not studied extensively. In this paper, we make the first attempt to tackle the catastrophic forgetting problem in the mainstream self-supervised methods, i.e., contrastive learning methods. Specifically, we first develop a rehearsal-based framework combined with a novel sampling strategy and a self-supervised knowledge distillation to transfer information over time efficiently. Then, we propose an extra sample queue to help the network separate the feature representations of old and new data in the embedding space. Experimental results show that compared with the naive self-supervised baseline, which learns tasks one by one without taking any technique, we improve the image classification accuracy by 1.60% on CIFAR-100, 2.86% on ImageNet-Sub, and 1.29% on ImageNet-Full under 10 incremental steps setting. Our code will be available at https://github.com/VDIGPKU/ContinualContrastiveLearning.
PDF Accepted in ICME2022

点此查看论文截图

Unsupervised Deraining: Where Asymmetric Contrastive Learning Meets Self-similarity

Authors:Yi Chang, Yun Guo, Yuntong Ye, Changfeng Yu, Lin Zhu, Xile Zhao, Luxin Yan, Yonghong Tian

Most of the existing learning-based deraining methods are supervisedly trained on synthetic rainy-clean pairs. The domain gap between the synthetic and real rain makes them less generalized to complex real rainy scenes. Moreover, the existing methods mainly utilize the property of the image or rain layers independently, while few of them have considered their mutually exclusive relationship. To solve above dilemma, we explore the intrinsic intra-similarity within each layer and inter-exclusiveness between two layers and propose an unsupervised non-local contrastive learning (NLCL) deraining method. The non-local self-similarity image patches as the positives are tightly pulled together, rain patches as the negatives are remarkably pushed away, and vice versa. On one hand, the intrinsic self-similarity knowledge within positive/negative samples of each layer benefits us to discover more compact representation; on the other hand, the mutually exclusive property between the two layers enriches the discriminative decomposition. Thus, the internal self-similarity within each layer (similarity) and the external exclusive relationship of the two layers (dissimilarity) serving as a generic image prior jointly facilitate us to unsupervisedly differentiate the rain from clean image. We further discover that the intrinsic dimension of the non-local image patches is generally higher than that of the rain patches. This motivates us to design an asymmetric contrastive loss to precisely model the compactness discrepancy of the two layers for better discriminative decomposition. In addition, considering that the existing real rain datasets are of low quality, either small scale or downloaded from the internet, we collect a real large-scale dataset under various rainy kinds of weather that contains high-resolution rainy images.
PDF 16 pages, 15 figures. arXiv admin note: substantial text overlap with arXiv:2203.11509

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录