Open-Set


2023-03-13 更新

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Authors:Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang

In this paper, we present an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs such as category names or referring expressions. The key solution of open-set object detection is introducing language to a closed-set detector for open-set concept generalization. To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion. While previous works mainly evaluate open-set object detection on novel categories, we propose to also perform evaluations on referring expression comprehension for objects specified with attributes. Grounding DINO performs remarkably well on all three settings, including benchmarks on COCO, LVIS, ODinW, and RefCOCO/+/g. Grounding DINO achieves a $52.5$ AP on the COCO detection zero-shot transfer benchmark, i.e., without any training data from COCO. It sets a new record on the ODinW zero-shot benchmark with a mean $26.1$ AP. Code will be available at \url{https://github.com/IDEA-Research/GroundingDINO}.
PDF Code will be available at https://github.com/IDEA-Research/GroundingDINO

点此查看论文截图

Boosting Open-Set Domain Adaptation with Threshold Self-Tuning and Cross-Domain Mixup

Authors:Xinghong Liu, Yi Zhou, Tao Zhou, Jie Qin, Shengcai Liao

Open-set domain adaptation (OSDA) aims to not only recognize target samples belonging to common classes shared by source and target domains but also perceive unknown class samples. Existing OSDA methods suffer from two obstacles. First, a tedious process of manually tuning a hyperparameter $threshold$ is required for most OSDA approaches to separate common and unknown classes. It is difficult to determine a proper threshold when the target domain data is unlabeled. Second, most OSDA methods only rely on confidence values predicted by models to distinguish common/unknown classes. The performance is not satisfied, especially when the majority of the target domain consists of unknown classes. Our experiments demonstrate that combining entropy, consistency, and confidence is a more reliable way of distinguishing common and unknown samples. In this paper, we design a novel threshold self-tuning and cross-domain mixup (TSCM) method to overcome the two drawbacks. TSCM can automatically tune a proper threshold utilizing unlabeled target samples rather than manually setting an empirical hyperparameter. Our method considers multiple criteria instead of only the confidence and uses the threshold generated by itself to separate common and unknown classes in the target domain. Furthermore, we introduce a cross-domain mixup method designed for OSDA scenarios to learn domain-invariant features in a more continuous latent space. Comprehensive experiments illustrate that our method consistently achieves superior performance on different benchmarks compared with various state-of-the-arts.
PDF

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录