检测/分割/跟踪


2022-07-06 更新

Dynamic boxes fusion strategy in object detection

Authors:Zhijiang Wan, Shichang Liu, Manyu Li

Object detection on microscopic scenarios is a popular task. As microscopes always have variable magnifications, the object can vary substantially in scale, which burdens the optimization of detectors. Moreover, different situations of camera focusing bring in the blurry images, which leads to great challenge of distinguishing the boundaries between objects and background. To solve the two issues mentioned above, we provide bags of useful training strategies and extensive experiments on Chula-ParasiteEgg-11 dataset, bring non-negligible results on ICIP 2022 Challenge: Parasitic Egg Detection and Classification in Microscopic Images, further more, we propose a new box selection strategy and an improved boxes fusion method for multi-model ensemble, as a result our method wins 1st place(mIoU 95.28%, mF1Score 99.62%), which is also the state-of-the-art method on Chula-ParasiteEgg-11 dataset.
PDF 5 pages, 3 figures, 7 citations

点此查看论文截图

ORA3D: Overlap Region Aware Multi-view 3D Object Detection

Authors:Wonseok Roh, Gyusam Chang, Seokha Moon, Giljoo Nam, Chanyoung Kim, Younghyun Kim, Sangpil Kim, Jinkyu Kim

In multi-view 3D object detection tasks, disparity supervision over overlapping image regions substantially improves the overall detection performance. However, current multi-view 3D object detection methods often fail to detect objects in the overlap region properly, and the network’s understanding of the scene is often limited to that of a monocular detection network. To mitigate this issue, we advocate for applying the traditional stereo disparity estimation method to obtain reliable disparity information for the overlap region. Given the disparity estimates as a supervision, we propose to regularize the network to fully utilize the geometric potential of binocular images, and improve the overall detection accuracy. Moreover, we propose to use an adversarial overlap region discriminator, which is trained to minimize the representational gap between non-overlap regions and overlapping regions where objects are often largely occluded or suffer from deformation due to camera distortion, causing a domain shift. We demonstrate the effectiveness of the proposed method with the large-scale multi-view 3D object detection benchmark, called nuScenes. Our experiment shows that our proposed method outperforms the current state-of-the-art methods.
PDF

点此查看论文截图

Traffic Context Aware Data Augmentation for Rare Object Detection in Autonomous Driving

Authors:Naifan Li, Fan Song, Ying Zhang, Pengpeng Liang, Erkang Cheng

Detection of rare objects (e.g., traffic cones, traffic barrels and traffic warning triangles) is an important perception task to improve the safety of autonomous driving. Training of such models typically requires a large number of annotated data which is expensive and time consuming to obtain. To address the above problem, an emerging approach is to apply data augmentation to automatically generate cost-free training samples. In this work, we propose a systematic study on simple Copy-Paste data augmentation for rare object detection in autonomous driving. Specifically, local adaptive instance-level image transformation is introduced to generate realistic rare object masks from source domain to the target domain. Moreover, traffic scene context is utilized to guide the placement of masks of rare objects. To this end, our data augmentation generates training data with high quality and realistic characteristics by leveraging both local and global consistency. In addition, we build a new dataset, Rare Object Dataset (ROD), consisting 10k training images, 4k validation images and the corresponding labels with a diverse range of scenarios in autonomous driving. Experiments on ROD show that our method achieves promising results on rare object detection. We also present a thorough study to illustrate the effectiveness of our local-adaptive and global constraints based Copy-Paste data augmentation for rare object detection. The data, development kit and more information of ROD are available online at: \url{https://nullmax-vision.github.io}.
PDF The IEEE Conference on Robotics and Automation, ICRA 2022

点此查看论文截图

Attention Guided Network for Salient Object Detection in Optical Remote Sensing Images

Authors:Yuhan Lin, Han Sun, Ningzhong Liu, Yetong Bian, Jun Cen, Huiyu Zhou

Due to the extreme complexity of scale and shape as well as the uncertainty of the predicted location, salient object detection in optical remote sensing images (RSI-SOD) is a very difficult task. The existing SOD methods can satisfy the detection performance for natural scene images, but they are not well adapted to RSI-SOD due to the above-mentioned image characteristics in remote sensing images. In this paper, we propose a novel Attention Guided Network (AGNet) for SOD in optical RSIs, including position enhancement stage and detail refinement stage. Specifically, the position enhancement stage consists of a semantic attention module and a contextual attention module to accurately describe the approximate location of salient objects. The detail refinement stage uses the proposed self-refinement module to progressively refine the predicted results under the guidance of attention and reverse attention. In addition, the hybrid loss is applied to supervise the training of the network, which can improve the performance of the model from three perspectives of pixel, region and statistics. Extensive experiments on two popular benchmarks demonstrate that AGNet achieves competitive performance compared to other state-of-the-art methods. The code will be available at https://github.com/NuaaYH/AGNet.
PDF accepted by ICANN2022, The code is available at https://github.com/NuaaYH/AGNet

点此查看论文截图

Domain Adaptive Nuclei Instance Segmentation and Classification via Category-aware Feature Alignment and Pseudo-labelling

Authors:Canran Li, Dongnan Liu, Haoran Li, Zheng Zhang, Guangming Lu, Xiaojun Chang, Weidong Cai

Unsupervised domain adaptation (UDA) methods have been broadly utilized to improve the models’ adaptation ability in general computer vision. However, different from the natural images, there exist huge semantic gaps for the nuclei from different categories in histopathology images. It is still under-explored how could we build generalized UDA models for precise segmentation or classification of nuclei instances across different datasets. In this work, we propose a novel deep neural network, namely Category-Aware feature alignment and Pseudo-Labelling Network (CAPL-Net) for UDA nuclei instance segmentation and classification. Specifically, we first propose a category-level feature alignment module with dynamic learnable trade-off weights. Second, we propose to facilitate the model performance on the target data via self-supervised training with pseudo labels based on nuclei-level prototype features. Comprehensive experiments on cross-domain nuclei instance segmentation and classification tasks demonstrate that our approach outperforms state-of-the-art UDA methods with a remarkable margin.
PDF Early accepted by MICCAI 2022

点此查看论文截图

Latents2Segments: Disentangling the Latent Space of Generative Models for Semantic Segmentation of Face Images

Authors:Snehal Singh Tomar, A. N. Rajagopalan

With the advent of an increasing number of Augmented and Virtual Reality applications that aim to perform meaningful and controlled style edits on images of human faces, the impetus for the task of parsing face images to produce accurate and fine-grained semantic segmentation maps is more than ever before. Few State of the Art (SOTA) methods which solve this problem, do so by incorporating priors with respect to facial structure or other face attributes such as expression and pose in their deep classifier architecture. Our endeavour in this work is to do away with the priors and complex pre-processing operations required by SOTA multi-class face segmentation models by reframing this operation as a downstream task post infusion of disentanglement with respect to facial semantic regions of interest (ROIs) in the latent space of a Generative Autoencoder model. We present results for our model’s performance on the CelebAMask-HQ and HELEN datasets. The encoded latent space of our model achieves significantly higher disentanglement with respect to semantic ROIs than that of other SOTA works. Moreover, it achieves a 13\% faster inference rate and comparable accuracy with respect to the publicly available SOTA for the downstream task of semantic segmentation of face images.
PDF 5 pages, 4 figures, 2 tables. The paper has already been accepted to and presented at CVPR Workshop on Computer Vision for Augmented and Virtual Reality, New Orleans, LA, 2022

点此查看论文截图

Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention

Authors:Gary Leung, Jun Gao, Xiaohui Zeng, Sanja Fidler

Existing transformer-based image backbones typically propagate feature information in one direction from lower to higher-levels. This may not be ideal since the localization ability to delineate accurate object boundaries, is most prominent in the lower, high-resolution feature maps, while the semantics that can disambiguate image signals belonging to one object vs. another, typically emerges in a higher level of processing. We present Hierarchical Inter-Level Attention (HILA), an attention-based method that captures Bottom-Up and Top-Down Updates between features of different levels. HILA extends hierarchical vision transformer architectures by adding local connections between features of higher and lower levels to the backbone encoder. In each iteration, we construct a hierarchy by having higher-level features compete for assignments to update lower-level features belonging to them, iteratively resolving object-part relationships. These improved lower-level features are then used to re-update the higher-level features. HILA can be integrated into the majority of hierarchical architectures without requiring any changes to the base model. We add HILA into SegFormer and the Swin Transformer and show notable improvements in accuracy in semantic segmentation with fewer parameters and FLOPS. Project website and code: https://www.cs.toronto.edu/~garyleung/hila/
PDF

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录