检测/分割/跟踪


2022-03-09 更新

Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

Authors:Yuchao Wang, Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Guoqiang Jin, Liwei Wu, Rui Zhao, Xinyi Le

The crux of semi-supervised semantic segmentation is to assign adequate pseudo-labels to the pixels of unlabeled images. A common practice is to select the highly confident predictions as the pseudo ground-truth, but it leads to a problem that most pixels may be left unused due to their unreliability. We argue that every pixel matters to the model training, even its prediction is ambiguous. Intuitively, an unreliable prediction may get confused among the top classes (i.e., those with the highest probabilities), however, it should be confident about the pixel not belonging to the remaining classes. Hence, such a pixel can be convincingly treated as a negative sample to those most unlikely categories. Based on this insight, we develop an effective pipeline to make sufficient use of unlabeled data. Concretely, we separate reliable and unreliable pixels via the entropy of predictions, push each unreliable pixel to a category-wise queue that consists of negative samples, and manage to train the model with all candidate pixels. Considering the training evolution, where the prediction becomes more and more accurate, we adaptively adjust the threshold for the reliable-unreliable partition. Experimental results on various benchmarks and training settings demonstrate the superiority of our approach over the state-of-the-art alternatives.
PDF Accepted to CVPR 2022

论文截图

Semantic Distillation Guided Salient Object Detection

Authors:Bo Xu, Guanze Liu, Han Huang, Cheng Lu, Yandong Guo

Most existing CNN-based salient object detection methods can identify local segmentation details like hair and animal fur, but often misinterpret the real saliency due to the lack of global contextual information caused by the subjectiveness of the SOD task and the locality of convolution layers. Moreover, due to the unrealistically expensive labeling costs, the current existing SOD datasets are insufficient to cover the real data distribution. The limitation and bias of the training data add additional difficulty to fully exploring the semantic association between object-to-object and object-to-environment in a given image. In this paper, we propose a semantic distillation guided SOD (SDG-SOD) method that produces accurate results by fusing semantically distilled knowledge from generated image captioning into the Vision-Transformer-based SOD framework. SDG-SOD can better uncover inter-objects and object-to-environment saliency and cover the gap between the subjective nature of SOD and its expensive labeling. Comprehensive experiments on five benchmark datasets demonstrate that the SDG-SOD outperforms the state-of-the-art approaches on four evaluation metrics, and largely improves the model performance on DUTS, ECSSD, DUT, HKU-IS, and PASCAL-S datasets.
PDF 14 pages, 10 figures

论文截图

Weakly Supervised Semantic Segmentation using Out-of-Distribution Data

Authors:Jungbeom Lee, Seong Joon Oh, Sangdoo Yun, Junsuk Choe, Eunji Kim, Sungroh Yoon

Weakly supervised semantic segmentation (WSSS) methods are often built on pixel-level localization maps obtained from a classifier. However, training on class labels only, classifiers suffer from the spurious correlation between foreground and background cues (e.g. train and rail), fundamentally bounding the performance of WSSS. There have been previous endeavors to address this issue with additional supervision. We propose a novel source of information to distinguish foreground from the background: Out-of-Distribution (OoD) data, or images devoid of foreground object classes. In particular, we utilize the hard OoDs that the classifier is likely to make false-positive predictions. These samples typically carry key visual features on the background (e.g. rail) that the classifiers often confuse as foreground (e.g. train), so these cues let classifiers correctly suppress spurious background cues. Acquiring such hard OoDs does not require an extensive amount of annotation efforts; it only incurs a few additional image-level labeling costs on top of the original efforts to collect class labels. We propose a method, W-OoD, for utilizing the hard OoDs. W-OoD achieves state-of-the-art performance on Pascal VOC 2012.
PDF CVPR 2022

论文截图

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

Authors:Mingfei Gao, Chen Xing, Juan Carlos Niebles, Junnan Li, Ran Xu, Wenhao Liu, Caiming Xiong

Despite great progress in object detection, most existing methods work only on a limited set of object categories, due to the tremendous human effort needed for instance-level bounding-box annotations of training data. To alleviate the problem, recent open vocabulary and zero-shot detection methods attempt to detect novel object categories beyond those seen during training. They achieve this goal by training on a pre-defined base categories to induce generalization to novel objects. However, their potential is still constrained by the small set of base categories available for training. To enlarge the set of base classes, we propose a method to automatically generate pseudo bounding-box annotations of diverse objects from large-scale image-caption pairs. Our method leverages the localization ability of pre-trained vision-language models to generate pseudo bounding-box labels and then directly uses them for training object detectors. Experimental results show that our method outperforms the state-of-the-art (SOTA) open vocabulary object detector by 8% AP on COCO novel categories, by 6.3% AP on PASCAL VOC, by 2.3% AP on Objects365 and by 2.8% AP on LVIS. Code is available: https://github.com/salesforce/PB-OVD.
PDF

论文截图

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation

Authors:Tao Zhang, Shiqing Wei, Shunping Ji

Contour-based instance segmentation methods have developed rapidly recently but feature rough and hand-crafted front-end contour initialization, which restricts the model performance, and an empirical and fixed backend predicted-label vertex pairing, which contributes to the learning difficulty. In this paper, we introduce a novel contour-based method, named E2EC, for high-quality instance segmentation. Firstly, E2EC applies a novel learnable contour initialization architecture instead of hand-crafted contour initialization. This consists of a contour initialization module for constructing more explicit learning goals and a global contour deformation module for taking advantage of all of the vertices’ features better. Secondly, we propose a novel label sampling scheme, named multi-direction alignment, to reduce the learning difficulty. Thirdly, to improve the quality of the boundary details, we dynamically match the most appropriate predicted-ground truth vertex pairs and propose the corresponding loss function named dynamic matching loss. The experiments showed that E2EC can achieve a state-of-the-art performance on the KITTI INStance (KINS) dataset, the Semantic Boundaries Dataset (SBD), the Cityscapes and the COCO dataset. E2EC is also efficient for use in real-time applications, with an inference speed of 36 fps for 512*512 images on an NVIDIA A6000 GPU. Code will be released at https://github.com/zhang-tao-whu/e2ec.
PDF CVPR2022

论文截图

Seeing BDD100K in dark: Single-Stage Night-time Object Detection via Continual Fourier Contrastive Learning

Authors:Ujjal Kr Dutta

In this paper, we study the lesser explored avenue of object detection at night-time. An object detector trained on abundant labeled daytime images often fails to perform well on night images, due to domain gap. As collecting more labeled data from night-time is expensive, unpaired generative image translation techniques seek to synthesize night-time images. However, unrealistic artifacts often arise on the synthetic images. Illuminating night-time inference images also does not work well in practice, as shown in our paper. To address these issues, we suggest a novel technique for enhancing the object detector via Contrastive Learning, which tries to group together embeddings of similar images. To provide anchor-positive image pairs for Contrastive Learning, we leverage Fourier Transformation, which is naturally good at preserving the semantics of an image. For practical benefits in real-time applications, we choose the recently proposed YOLOF single-stage detector, which provides a simple and clean encoder-decoder segregation of the detector network. However, merely trying to teach the encoder to perform well on the auxiliary Contrastive Learning task may lead to catastrophic forgetting of the knowledge essential for object detection. Hence, we train the encoder in a Continual Learning fashion. Our novel method by an elegant training framework achieves state-of-the-art performance on the large scale BDD100K dataset, in an uniform setting, chosen, to the best of our knowledge, for the first time.
PDF

论文截图

文章作者: Harvey
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Harvey !
  目录