检测/分割/跟踪


2024-01-12 更新

RaTrack: Moving Object Detection and Tracking with 4D Radar Point Cloud

Authors:Zhijun Pan, Fangqiang Ding, Hantao Zhong, Chris Xiaoxuan Lu

Mobile autonomy relies on the precise perception of dynamic environments. Robustly tracking moving objects in 3D world thus plays a pivotal role for applications like trajectory prediction, obstacle avoidance, and path planning. While most current methods utilize LiDARs or cameras for Multiple Object Tracking (MOT), the capabilities of 4D imaging radars remain largely unexplored. Recognizing the challenges posed by radar noise and point sparsity in 4D radar data, we introduce RaTrack, an innovative solution tailored for radar-based tracking. Bypassing the typical reliance on specific object types and 3D bounding boxes, our method focuses on motion segmentation and clustering, enriched by a motion estimation module. Evaluated on the View-of-Delft dataset, RaTrack showcases superior tracking precision of moving objects, largely surpassing the performance of the state of the art.
PDF 8 pages, 4 figures. Co-first authorship for Zhijun Pan, Fangqiang Ding and Hantao Zhong, listed randomly

点此查看论文截图

UCMCTrack: Multi-Object Tracking with Uniform Camera Motion Compensation

Authors:Kefu Yi, Kai Luo, Xiaolei Luo, Jiangui Huang, Hao Wu, Rongdong Hu, Wei Hao

Multi-object tracking (MOT) in video sequences remains a challenging task, especially in scenarios with significant camera movements. This is because targets can drift considerably on the image plane, leading to erroneous tracking outcomes. Addressing such challenges typically requires supplementary appearance cues or Camera Motion Compensation (CMC). While these strategies are effective, they also introduce a considerable computational burden, posing challenges for real-time MOT. In response to this, we introduce UCMCTrack, a novel motion model-based tracker robust to camera movements. Unlike conventional CMC that computes compensation parameters frame-by-frame, UCMCTrack consistently applies the same compensation parameters throughout a video sequence. It employs a Kalman filter on the ground plane and introduces the Mapped Mahalanobis Distance (MMD) as an alternative to the traditional Intersection over Union (IoU) distance measure. By leveraging projected probability distributions on the ground plane, our approach efficiently captures motion patterns and adeptly manages uncertainties introduced by homography projections. Remarkably, UCMCTrack, relying solely on motion cues, achieves state-of-the-art performance across a variety of challenging datasets, including MOT17, MOT20, DanceTrack and KITTI. More details and code are available at https://github.com/corfyi/UCMCTrack
PDF Accepted to AAAI 2024

点此查看论文截图

Explicit Visual Prompts for Visual Object Tracking

Authors:Liangtao Shi, Bineng Zhong, Qihua Liang, Ning Li, Shengping Zhang, Xianxian Li

How to effectively exploit spatio-temporal information is crucial to capture target appearance changes in visual tracking. However, most deep learning-based trackers mainly focus on designing a complicated appearance model or template updating strategy, while lacking the exploitation of context between consecutive frames and thus entailing the \textit{when-and-how-to-update} dilemma. To address these issues, we propose a novel explicit visual prompts framework for visual tracking, dubbed \textbf{EVPTrack}. Specifically, we utilize spatio-temporal tokens to propagate information between consecutive frames without focusing on updating templates. As a result, we cannot only alleviate the challenge of \textit{when-to-update}, but also avoid the hyper-parameters associated with updating strategies. Then, we utilize the spatio-temporal tokens to generate explicit visual prompts that facilitate inference in the current frame. The prompts are fed into a transformer encoder together with the image tokens without additional processing. Consequently, the efficiency of our model is improved by avoiding \textit{how-to-update}. In addition, we consider multi-scale information as explicit visual prompts, providing multiscale template features to enhance the EVPTrack’s ability to handle target scale changes. Extensive experimental results on six benchmarks (i.e., LaSOT, LaSOT\rm $_{ext}$, GOT-10k, UAV123, TrackingNet, and TNL2K.) validate that our EVPTrack can achieve competitive performance at a real-time speed by effectively exploiting both spatio-temporal and multi-scale information. Code and models are available at https://github.com/GXNU-ZhongLab/EVPTrack.
PDF

点此查看论文截图

A New Dataset and a Distractor-Aware Architecture for Transparent Object Tracking

Authors:Alan Lukezic, Ziga Trojer, Jiri Matas, Matej Kristan

Performance of modern trackers degrades substantially on transparent objects compared to opaque objects. This is largely due to two distinct reasons. Transparent objects are unique in that their appearance is directly affected by the background. Furthermore, transparent object scenes often contain many visually similar objects (distractors), which often lead to tracking failure. However, development of modern tracking architectures requires large training sets, which do not exist in transparent object tracking. We present two contributions addressing the aforementioned issues. We propose the first transparent object tracking training dataset Trans2k that consists of over 2k sequences with 104,343 images overall, annotated by bounding boxes and segmentation masks. Standard trackers trained on this dataset consistently improve by up to 16%. Our second contribution is a new distractor-aware transparent object tracker (DiTra) that treats localization accuracy and target identification as separate tasks and implements them by a novel architecture. DiTra sets a new state-of-the-art in transparent object tracking and generalizes well to opaque objects.
PDF Under the review. arXiv admin note: substantial text overlap with arXiv:2210.03436

点此查看论文截图

Multi-scale attention-based instance segmentation for measuring crystals with large size variation

Authors:Theresa Neubauer, Astrid Berg, Maria Wimmer, Dimitrios Lenis, David Major, Philip Matthias Winter, Gaia Romana De Paolis, Johannes Novotny, Daniel Lüftner, Katja Reinharter, Katja Bühler

Quantitative measurement of crystals in high-resolution images allows for important insights into underlying material characteristics. Deep learning has shown great progress in vision-based automatic crystal size measurement, but current instance segmentation methods reach their limits with images that have large variation in crystal size or hard to detect crystal boundaries. Even small image segmentation errors, such as incorrectly fused or separated segments, can significantly lower the accuracy of the measured results. Instead of improving the existing pixel-wise boundary segmentation methods, we propose to use an instance-based segmentation method, which gives more robust segmentation results to improve measurement accuracy. Our novel method enhances flow maps with a size-aware multi-scale attention module. The attention module adaptively fuses information from multiple scales and focuses on the most relevant scale for each segmented image area. We demonstrate that our proposed attention fusion strategy outperforms state-of-the-art instance and boundary segmentation methods, as well as simple average fusion of multi-scale predictions. We evaluate our method on a refractory raw material dataset of high-resolution images with large variation in crystal size and show that our model can be used to calculate the crystal size more accurately than existing methods.
PDF has been accepted for publication in IEEE Transactions on Instrumentation and Measurement

点此查看论文截图

CLIP-guided Source-free Object Detection in Aerial Images

Authors:Nanqing Liu, Xun Xu, Yongyi Su, Chengxin Liu, Peiliang Gong, Heng-Chao Li

Domain adaptation is crucial in aerial imagery, as the visual representation of these images can significantly vary based on factors such as geographic location, time, and weather conditions. Additionally, high-resolution aerial images often require substantial storage space and may not be readily accessible to the public. To address these challenges, we propose a novel Source-Free Object Detection (SFOD) method. Specifically, our approach is built upon a self-training framework; however, self-training can lead to inaccurate learning in the absence of labeled training data. To address this issue, we further integrate Contrastive Language-Image Pre-training (CLIP) to guide the generation of pseudo-labels, termed CLIP-guided Aggregation. By leveraging CLIP’s zero-shot classification capability, we use it to aggregate scores with the original predicted bounding boxes, enabling us to obtain refined scores for the pseudo-labels. To validate the effectiveness of our method, we constructed two new datasets from different domains based on the DIOR dataset, named DIOR-C and DIOR-Cloudy. Experiments demonstrate that our method outperforms other comparative algorithms.
PDF Submitted to IGARSS2024

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录