2022-11-28 更新
Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark
Authors:Floriana Ciaglia, Francesco Saverio Zuppichini, Paul Guerrie, Mark McQuade, Jacob Solawetz
The evaluation of object detection models is usually performed by optimizing a single metric, e.g. mAP, on a fixed set of datasets, e.g. Microsoft COCO and Pascal VOC. Due to image retrieval and annotation costs, these datasets consist largely of images found on the web and do not represent many real-life domains that are being modelled in practice, e.g. satellite, microscopic and gaming, making it difficult to assert the degree of generalization learned by the model. We introduce the Roboflow-100 (RF100) consisting of 100 datasets, 7 imagery domains, 224,714 images, and 805 class labels with over 11,170 labelling hours. We derived RF100 from over 90,000 public datasets, 60 million public images that are actively being assembled and labelled by computer vision practitioners in the open on the web application Roboflow Universe. By releasing RF100, we aim to provide a semantically diverse, multi-domain benchmark of datasets to help researchers test their model’s generalizability with real-life data. RF100 download and benchmark replication are available on GitHub.
PDF
点此查看论文截图
Contrastive pretraining for semantic segmentation is robust to noisy positive pairs
Authors:Sebastian Gerard, Josephine Sullivan
Domain-specific variants of contrastive learning can construct positive pairs from two distinct images, as opposed to augmenting the same image twice. Unlike in traditional contrastive methods, this can result in positive pairs not matching perfectly. Similar to false negative pairs, this could impede model performance. Surprisingly, we find that downstream semantic segmentation is either robust to the noisy pairs or even benefits from them. The experiments are conducted on the remote sensing dataset xBD, and a synthetic segmentation dataset, on which we have full control over the noise parameters. As a result, practitioners should be able to use such domain-specific contrastive methods without having to filter their positive pairs beforehand.
PDF 8 pages, 8 figures
点此查看论文截图
FLAIR #1: semantic segmentation and domain adaptation dataset
Authors:Anatol Garioud, Stéphane Peillet, Eva Bookjans, Sébastien Giordano, Boris Wattrelos
The French National Institute of Geographical and Forest Information (IGN) has the mission to document and measure land-cover on French territory and provides referential geographical datasets, including high-resolution aerial images and topographic maps. The monitoring of land-cover plays a crucial role in land management and planning initiatives, which can have significant socio-economic and environmental impact. Together with remote sensing technologies, artificial intelligence (IA) promises to become a powerful tool in determining land-cover and its evolution. IGN is currently exploring the potential of IA in the production of high-resolution land cover maps. Notably, deep learning methods are employed to obtain a semantic segmentation of aerial images. However, territories as large as France imply heterogeneous contexts: variations in landscapes and image acquisition make it challenging to provide uniform, reliable and accurate results across all of France. The FLAIR-one dataset presented is part of the dataset currently used at IGN to establish the French national reference land cover map “Occupation du sol `a grande \’echelle” (OCS- GE).
PDF
点此查看论文截图
ALBench: A Framework for Evaluating Active Learning in Object Detection
Authors:Zhanpeng Feng, Shiliang Zhang, Rinyoichi Takezoe, Wenze Hu, Manmohan Chandraker, Li-Jia Li, Vijay K. Narayanan, Xiaoyu Wang
Active learning is an important technology for automated machine learning systems. In contrast to Neural Architecture Search (NAS) which aims at automating neural network architecture design, active learning aims at automating training data selection. It is especially critical for training a long-tailed task, in which positive samples are sparsely distributed. Active learning alleviates the expensive data annotation issue through incrementally training models powered with efficient data selection. Instead of annotating all unlabeled samples, it iteratively selects and annotates the most valuable samples. Active learning has been popular in image classification, but has not been fully explored in object detection. Most of current approaches on object detection are evaluated with different settings, making it difficult to fairly compare their performance. To facilitate the research in this field, this paper contributes an active learning benchmark framework named as ALBench for evaluating active learning in object detection. Developed on an automatic deep model training system, this ALBench framework is easy-to-use, compatible with different active learning algorithms, and ensures the same training and testing protocols. We hope this automated benchmark system help researchers to easily reproduce literature’s performance and have objective comparisons with prior arts. The code will be release through Github.
PDF
点此查看论文截图
2022-11-28 更新
Object Detection in Foggy Scenes by Embedding Depth and Reconstruction into Domain Adaptation
Authors:Xin Yang, Michael Bi Mi, Yuan Yuan, Xin Wang, Robby T. Tan
Most existing domain adaptation (DA) methods align the features based on the domain feature distributions and ignore aspects related to fog, background and target objects, rendering suboptimal performance. In our DA framework, we retain the depth and background information during the domain feature alignment. A consistency loss between the generated depth and fog transmission map is introduced to strengthen the retention of the depth information in the aligned features. To address false object features potentially generated during the DA process, we propose an encoder-decoder framework to reconstruct the fog-free background image. This reconstruction loss also reinforces the encoder, i.e., our DA backbone, to minimize false object features.Moreover, we involve our target data in training both our DA module and our detection module in a semi-supervised manner, so that our detection module is also exposed to the unlabeled target data, the type of data used in the testing stage. Using these ideas, our method significantly outperforms the state-of-the-art method (47.6 mAP against the 44.3 mAP on the Foggy Cityscapes dataset), and obtains the best performance on multiple real-image public datasets. Code is available at: https://github.com/VIML-CVDL/Object-Detection-in-Foggy-Scenes
PDF Accepted by ACCV
点此查看论文截图
CRAFT: Camera-Radar 3D Object Detection with Spatio-Contextual Fusion Transformer
Authors:Youngseok Kim, Sanmin Kim, Jun Won Choi, Dongsuk Kum
Camera and radar sensors have significant advantages in cost, reliability, and maintenance compared to LiDAR. Existing fusion methods often fuse the outputs of single modalities at the result-level, called the late fusion strategy. This can benefit from using off-the-shelf single sensor detection algorithms, but late fusion cannot fully exploit the complementary properties of sensors, thus having limited performance despite the huge potential of camera-radar fusion. Here we propose a novel proposal-level early fusion approach that effectively exploits both spatial and contextual properties of camera and radar for 3D object detection. Our fusion framework first associates image proposal with radar points in the polar coordinate system to efficiently handle the discrepancy between the coordinate system and spatial properties. Using this as a first stage, following consecutive cross-attention based feature fusion layers adaptively exchange spatio-contextual information between camera and radar, leading to a robust and attentive fusion. Our camera-radar fusion approach achieves the state-of-the-art 41.1% mAP and 52.3% NDS on the nuScenes test set, which is 8.7 and 10.8 points higher than the camera-only baseline, as well as yielding competitive performance on the LiDAR method.
PDF Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI’23)
点此查看论文截图
A Benchmark of Long-tailed Instance Segmentation with Noisy Labels (Short Version)
Authors:Guanlin Li, Guowen Xu, Tianwei Zhang
In this paper, we consider the instance segmentation task on a long-tailed dataset, which contains label noise, i.e., some of the annotations are incorrect. There are two main reasons making this case realistic. First, datasets collected from real world usually obey a long-tailed distribution. Second, for instance segmentation datasets, as there are many instances in one image and some of them are tiny, it is easier to introduce noise into the annotations. Specifically, we propose a new dataset, which is a large vocabulary long-tailed dataset containing label noise for instance segmentation. Furthermore, we evaluate previous proposed instance segmentation algorithms on this dataset. The results indicate that the noise in the training dataset will hamper the model in learning rare categories and decrease the overall performance, and inspire us to explore more effective approaches to address this practical challenge. The code and dataset are available in https://github.com/GuanlinLee/Noisy-LVIS.
PDF