检测/分割/跟踪


2022-09-27 更新

Adversarial Dual-Student with Differentiable Spatial Warping for Semi-Supervised Semantic Segmentation

Authors:Cong Cao, Tianwei Lin, Dongliang He, Fu Li, Huanjing Yue, Jingyu Yang, Errui Ding

A common challenge posed to robust semantic segmentation is the expensive data annotation cost. Existing semi-supervised solutions show great potential for solving this problem. Their key idea is constructing consistency regularization with unsupervised data augmentation from unlabeled data for model training. The perturbations for unlabeled data enable the consistency training loss, which benefits semi-supervised semantic segmentation. However, these perturbations destroy image context and introduce unnatural boundaries, which is harmful for semantic segmentation. Besides, the widely adopted semi-supervised learning framework, i.e. mean-teacher, suffers performance limitation since the student model finally converges to the teacher model. In this paper, first of all, we propose a context friendly differentiable geometric warping to conduct unsupervised data augmentation; secondly, a novel adversarial dual-student framework is proposed to improve the Mean-Teacher from the following two aspects: (1) dual student models are learned independently except for a stabilization constraint to encourage exploiting model diversities; (2) adversarial training scheme is applied to both students and the discriminators are resorted to distinguish reliable pseudo-label of unlabeled data for self-training. Effectiveness is validated via extensive experiments on PASCAL VOC2012 and Cityscapes. Our solution significantly improves the performance and state-of-the-art results are achieved on both datasets. Remarkably, compared with fully supervision, our solution achieves comparable mIoU of 73.4% using only 12.5% annotated data on PASCAL VOC2012. Our codes and models are available at https://github.com/caocong/ADS-SemiSeg.
PDF Accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)

点此查看论文截图

Revisiting Image Pyramid Structure for High Resolution Salient Object Detection

Authors:Taehun Kim, Kunhee Kim, Joonyeong Lee, Dongmin Cha, Jiho Lee, Daijin Kim

Salient object detection (SOD) has been in the spotlight recently, yet has been studied less for high-resolution (HR) images. Unfortunately, HR images and their pixel-level annotations are certainly more labor-intensive and time-consuming compared to low-resolution (LR) images and annotations. Therefore, we propose an image pyramid-based SOD framework, Inverse Saliency Pyramid Reconstruction Network (InSPyReNet), for HR prediction without any of HR datasets. We design InSPyReNet to produce a strict image pyramid structure of saliency map, which enables to ensemble multiple results with pyramid-based image blending. For HR prediction, we design a pyramid blending method which synthesizes two different image pyramids from a pair of LR and HR scale from the same image to overcome effective receptive field (ERF) discrepancy. Our extensive evaluations on public LR and HR SOD benchmarks demonstrate that InSPyReNet surpasses the State-of-the-Art (SotA) methods on various SOD metrics and boundary accuracy.
PDF 27 pages, 15 figures, 7 tables. To appear in the 16th Asian Conference on Computer Vision (ACCV2022), December 4-8, 2022, Macau SAR, China. DOI will be added soon

点此查看论文截图

Any Object is a Potential Weapon! Weaponized Violence Detection using Salient Image

Authors:Toluwani Aremu, Li Zhiyuan, Reem Alameeri

In every connected smart city around the world, CCTVs have played a pivotal role in enforcing the safety and security of the citizens by recording unlawful activities for the authorities to take action. To ensure the efficiency and effectiveness of CCTVs in this domain, different DNN architectures were created and used by researchers and developers to either detect violence or detect weapons using bounding boxes or masks. These weapons are limited to guns, knives, and other obvious handheld weapons. To remove these limits and detect weapons more efficiently, non-weaponized violence footage from CCTV must be differentiable from weaponized ones. Since there are no current datasets that are tailored to this purpose of generalizability in weaponized violence detection, we introduced a new dataset that contains videos depicting weaponized violence, non-weaponized violence, and non-violent events. We also propose a novel data-centric method that arranges video frames into salient images while minimizing information loss for comfortable inference by SOTA image classifiers. This was done to simplify video classification tasks and optimize inference latency to improve sustainability in smart cities. Our experiments show that Image Classifiers can efficiently detect and distinguish violence with weapons from violence without weapons with performances as high as 99\%, which are comparable with current SOTA 3D networks for action recognition and video classification.
PDF 8 pages: 3 figures, 4 tables

点此查看论文截图

Towards Stable Co-saliency Detection and Object Co-segmentation

Authors:Bo Li, Lv Tang, Senyun Kuang, Mofei Song, Shouhong Ding

In this paper, we present a novel model for simultaneous stable co-saliency detection (CoSOD) and object co-segmentation (CoSEG). To detect co-saliency (segmentation) accurately, the core problem is to well model inter-image relations between an image group. Some methods design sophisticated modules, such as recurrent neural network (RNN), to address this problem. However, order-sensitive problem is the major drawback of RNN, which heavily affects the stability of proposed CoSOD (CoSEG) model. In this paper, inspired by RNN-based model, we first propose a multi-path stable recurrent unit (MSRU), containing dummy orders mechanisms (DOM) and recurrent unit (RU). Our proposed MSRU not only helps CoSOD (CoSEG) model captures robust inter-image relations, but also reduces order-sensitivity, resulting in a more stable inference and training process. { Moreover, we design a cross-order contrastive loss (COCL) that can further address order-sensitive problem by pulling close the feature embedding generated from different input orders.} We validate our model on five widely used CoSOD datasets (CoCA, CoSOD3k, Cosal2015, iCoseg and MSRC), and three widely used datasets (Internet, iCoseg and PASCAL-VOC) for object co-segmentation, the performance demonstrates the superiority of the proposed approach as compared to the state-of-the-art (SOTA) methods.
PDF

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录