发布日期: 2022-07-13

2022-07-13 更新

Looking Beyond Corners: Contrastive Learning of Visual Representations for Keypoint Detection and Description Extraction

Authors:Henrique Siqueira, Patrick Ruhkamp, Ibrahim Halfaoui, Markus Karmann, Onay Urfalioglu

Learnable keypoint detectors and descriptors are beginning to outperform classical hand-crafted feature extraction methods. Recent studies on self-supervised learning of visual representations have driven the increasing performance of learnable models based on deep networks. By leveraging traditional data augmentations and homography transformations, these networks learn to detect corners under adverse conditions such as extreme illumination changes. However, their generalization capabilities are limited to corner-like features detected a priori by classical methods or synthetically generated data. In this paper, we propose the Correspondence Network (CorrNet) that learns to detect repeatable keypoints and to extract discriminative descriptions via unsupervised contrastive learning under spatial constraints. Our experiments show that CorrNet is not only able to detect low-level features such as corners, but also high-level features that represent similar objects present in a pair of input images through our proposed joint guided backpropagation of their latent space. Our approach obtains competitive results under viewpoint changes and achieves state-of-the-art performance under illumination changes.
PDF Accepted at IEEE WCCI 2022

点此查看论文截图

Vision Transformer for Contrastive Clustering

Authors:Hua-Bao Ling, Bowen Zhu, Dong Huang, Ding-Hua Chen, Chang-Dong Wang, Jian-Huang Lai

Vision Transformer (ViT) has shown its advantages over the convolutional neural network (CNN) with its ability to capture global long-range dependencies for visual representation learning. Besides ViT, contrastive learning is another popular research topic recently. While previous contrastive learning works are mostly based on CNNs, some recent studies have attempted to combine ViT and contrastive learning for enhanced self-supervised learning. Despite the considerable progress, these combinations of ViT and contrastive learning mostly focus on the instance-level contrastiveness, which often overlook the global contrastiveness and also lack the ability to directly learn the clustering result (e.g., for images). In view of this, this paper presents a novel deep clustering approach termed Vision Transformer for Contrastive Clustering (VTCC), which for the first time, to our knowledge, unifies the Transformer and the contrastive learning for the image clustering task. Specifically, with two random augmentations performed on each image, we utilize a ViT encoder with two weight-sharing views as the backbone. To remedy the potential instability of the ViT, we incorporate a convolutional stem to split each augmented sample into a sequence of patches, which uses multiple stacked small convolutions instead of a big convolution in the patch projection layer. By learning the feature representations for the sequences of patches via the backbone, an instance projector and a cluster projector are further utilized to perform the instance-level contrastive learning and the global clustering structure learning, respectively. Experiments on eight image datasets demonstrate the stability (during the training-from-scratch) and the superiority (in clustering performance) of our VTCC approach over the state-of-the-art.
PDF

点此查看论文截图

Self-supervised Learning with Local Contrastive Loss for Detection and Semantic Segmentation

Authors:Ashraful Islam, Ben Lundell, Harpreet Sawhney, Sudipta Sinha, Peter Morales, Richard J. Radke

We present a self-supervised learning (SSL) method suitable for semi-global tasks such as object detection and semantic segmentation. We enforce local consistency between self-learned features, representing corresponding image locations of transformed versions of the same image, by minimizing a pixel-level local contrastive (LC) loss during training. LC-loss can be added to existing self-supervised learning methods with minimal overhead. We evaluate our SSL approach on two downstream tasks — object detection and semantic segmentation, using COCO, PASCAL VOC, and CityScapes datasets. Our method outperforms the existing state-of-the-art SSL approaches by 1.9% on COCO object detection, 1.4% on PASCAL VOC detection, and 0.6% on CityScapes segmentation.
PDF

点此查看论文截图

木子已

https://ipaper.today/2022/07/13/2022-07-13-wu-jian-du-ban-jian-du-dui-bi-xue-xi/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源木子已 !

无监督半监督对比学习

I2I Translation

2022-07-14 I2I Translation

I2I Translation

检测/分割/跟踪

2022-07-13 检测/分割/跟踪

检测分割跟踪

无监督/半监督/对比学习

2022-07-13 更新

Looking Beyond Corners: Contrastive Learning of Visual Representations for Keypoint Detection and Description Extraction

Vision Transformer for Contrastive Clustering

Self-supervised Learning with Local Contrastive Loss for Detection and Semantic Segmentation

打赏用于支持本站流量费