2023-11-28 更新
PointOBB: Learning Oriented Object Detection via Single Point Supervision
Authors:Junwei Luo, Xue Yang, Yi Yu, Qingyun Li, Junchi Yan, Yansheng Li
Single point-supervised object detection is gaining attention due to its cost-effectiveness. However, existing approaches focus on generating horizontal bounding boxes (HBBs) while ignoring oriented bounding boxes (OBBs) commonly used for objects in aerial images. This paper proposes PointOBB, the first single Point-based OBB generation method, for oriented object detection. PointOBB operates through the collaborative utilization of three distinctive views: an original view, a resized view, and a rotated/flipped (rot/flp) view. Upon the original view, we leverage the resized and rot/flp views to build a scale augmentation module and an angle acquisition module, respectively. In the former module, a Scale-Sensitive Consistency (SSC) loss is designed to enhance the deep network’s ability to perceive the object scale. For accurate object angle predictions, the latter module incorporates self-supervised learning to predict angles, which is associated with a scale-guided Dense-to-Sparse (DS) matching strategy for aggregating dense angles corresponding to sparse objects. The resized and rot/flp views are switched using a progressive multi-view switching strategy during training to achieve coupled optimization of scale and angle. Experimental results on the DIOR-R and DOTA-v1.0 datasets demonstrate that PointOBB achieves promising performance, and significantly outperforms potential point-supervised baselines.
PDF 11 pages,5 figures, 6 tables. Code: https://github.com/Luo-Z13/pointobb
点此查看论文截图
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision
Authors:Yu Yi, Xue Yang, Qingyun Li, Feipeng Da, Junchi Yan, Jifeng Dai, Yu Qiao
With the rapidly increasing demand for oriented object detection (OOD), recent research involving weakly-supervised detectors for learning rotated box (RBox) from the horizontal box (HBox) has attracted more and more attention. In this paper, we explore a more challenging yet label-efficient setting, namely single point-supervised OOD, and present our approach called Point2RBox. Specifically, we propose to leverage two principles: 1) Synthetic pattern knowledge combination: By sampling around each labelled point on the image, we transfer the object feature to synthetic visual patterns with the known bounding box to provide the knowledge for box regression. 2) Transform self-supervision: With a transformed input image (e.g. scaled/rotated), the output RBoxes are trained to follow the same transformation so that the network can perceive the relative size/rotation between objects. The detector is further enhanced by a few devised techniques to cope with peripheral issues, e.g. the anchor/layer assignment as the size of the object is not available in our point supervision setting. To our best knowledge, Point2RBox is the first end-to-end solution for point-supervised OOD. In particular, our method uses a lightweight paradigm, yet it achieves a competitive performance among point-supervised alternatives, 41.05%/27.62%/80.01% on DOTA/DIOR/HRSC datasets.
PDF 11 pages, 3 figures, 5 tables, code: https://github.com/open-mmlab/mmrotate
点此查看论文截图
Can SAM recognize crops? Quantifying the zero-shot performance of a semantic segmentation foundation model on generating crop-type maps using satellite imagery for precision agriculture
Authors:Rutuja Gurav, Het Patel, Zhuocheng Shang, Ahmed Eldawy, Jia Chen, Elia Scudiero, Evangelos Papalexakis
Climate change is increasingly disrupting worldwide agriculture, making global food production less reliable.To tackle the growing challenges in feeding the planet, cutting-edge management strategies, such as precision agriculture, empower farmers and decision-makers with rich and actionable information to increase the efficiency and sustainability of their farming practices.Crop-type maps are key information for decision-support tools but are challenging and costly to generate.We investigate the capabilities of Meta AI’s Segment Anything Model (SAM) for crop-map prediction task, acknowledging its recent successes at zero-shot image segmentation.However, SAM being limited to up-to 3 channel inputs and its zero-shot usage being class-agnostic in nature pose unique challenges in using it directly for crop-type mapping.We propose using clustering consensus metrics to assess SAM’s zero-shot performance in segmenting satellite imagery and producing crop-type maps.Although direct crop-type mapping is challenging using SAM in zero-shot setting, experiments reveal SAM’s potential for swiftly and accurately outlining fields in satellite images, serving as a foundation for subsequent crop classification.This paper attempts to highlight a use-case of state-of-the-art image segmentation models like SAM for crop-type mapping and related specific needs of the agriculture industry, offering a potential avenue for automatic, efficient, and cost-effective data products for precision agriculture practices.
PDF
点此查看论文截图
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Authors:Bin Xie, Jiale Cao, Jin Xie, Fahad Shahbaz Khan, Yanwei Pang
Open-vocabulary semantic segmentation strives to distinguish pixels into different semantic groups from an open set of categories. Most existing methods explore utilizing pre-trained vision-language models, in which the key is to adopt the image-level model for pixel-level segmentation task. In this paper, we propose a simple encoder-decoder, named SED, for open-vocabulary semantic segmentation, which comprises a hierarchical encoder-based cost map generation and a gradual fusion decoder with category early rejection. The hierarchical encoder-based cost map generation employs hierarchical backbone, instead of plain transformer, to predict pixel-level image-text cost map. Compared to plain transformer, hierarchical backbone better captures local spatial information and has linear computational complexity with respect to input size. Our gradual fusion decoder employs a top-down structure to combine cost map and the feature maps of different backbone levels for segmentation. To accelerate inference speed, we introduce a category early rejection scheme in the decoder that rejects many no-existing categories at the early layer of decoder, resulting in at most 4.7 times acceleration without accuracy degradation. Experiments are performed on multiple open-vocabulary semantic segmentation datasets, which demonstrates the efficacy of our SED method. When using ConvNeXt-B, our SED method achieves mIoU score of 31.6\% on ADE20K with 150 categories at 82 millisecond ($ms$) per image on a single A6000. We will release it at \url{https://github.com/xb534/SED.git}.
PDF
点此查看论文截图
2D Feature Distillation for Weakly- and Semi-Supervised 3D Semantic Segmentation
Authors:Ozan Unal, Dengxin Dai, Lukas Hoyer, Yigit Baran Can, Luc Van Gool
As 3D perception problems grow in popularity and the need for large-scale labeled datasets for LiDAR semantic segmentation increase, new methods arise that aim to reduce the necessity for dense annotations by employing weakly-supervised training. However these methods continue to show weak boundary estimation and high false negative rates for small objects and distant sparse regions. We argue that such weaknesses can be compensated by using RGB images which provide a denser representation of the scene. We propose an image-guidance network (IGNet) which builds upon the idea of distilling high level feature information from a domain adapted synthetically trained 2D semantic segmentation network. We further utilize a one-way contrastive learning scheme alongside a novel mixing strategy called FOVMix, to combat the horizontal field-of-view mismatch between the two sensors and enhance the effects of image guidance. IGNet achieves state-of-the-art results for weakly-supervised LiDAR semantic segmentation on ScribbleKITTI, boasting up to 98% relative performance to fully supervised training with only 8% labeled points, while introducing no additional annotation burden or computational/memory cost during inference. Furthermore, we show that our contributions also prove effective for semi-supervised training, where IGNet claims state-of-the-art results on both ScribbleKITTI and SemanticKITTI.
PDF Accepted at WACV 2024
点此查看论文截图
Unleashing the Power of Prompt-driven Nucleus Instance Segmentation
Authors:Zhongyi Shui, Yunlong Zhang, Kai Yao, Chenglu Zhu, Yuxuan Sun, Lin Yang
Nuclear instance segmentation in histology images is crucial for a broad spectrum of clinical applications. Current prevailing nuclear instance segmentation algorithms rely on regression of nuclei contours, distance maps, watershed markers or a proxy nuclear representation of star-convex polygons. Consequently, these methods necessitate sophisticated post-processing operations to distinguish nuclei instances, which are commonly acknowledged to be error-prone and parameter-sensitive. Recently, the segment anything model (SAM) has earned attracted huge attention within the domain of medical image segmentation due to its impressive generalization ability and promptable property. Nevertheless, its potential on nuclear instance segmentation remains largely underexplored. In this paper, we present a novel prompt-driven framework that consists of a point prompter and a SAM for automatic nuclei instance segmentation. Specifically, the prompter learns to generate a unique point prompt for each nucleus while the SAM is fine tuned to output the corresponding mask of the cued nucleus. Furthermore, we propose to add adjacent nuclei as negative prompts to promote the model’s ability to recognize overlapping nuclei. Without bells and whistles, our proposed method sets a new state-of-the-art performance on three challenging benchmarks. Our code is available at \textcolor{magenta}{\url{https://github.com/windygoo/PromptNucSeg}} .
PDF