2023-04-17 更新
YOLO-Drone:Airborne real-time detection of dense small objects from high-altitude perspective
Authors:Li Zhu, Jiahui Xiong, Feng Xiong, Hanzheng Hu, Zhengnan Jiang
Unmanned Aerial Vehicles (UAVs), specifically drones equipped with remote sensing object detection technology, have rapidly gained a broad spectrum of applications and emerged as one of the primary research focuses in the field of computer vision. Although UAV remote sensing systems have the ability to detect various objects, small-scale objects can be challenging to detect reliably due to factors such as object size, image degradation, and real-time limitations. To tackle these issues, a real-time object detection algorithm (YOLO-Drone) is proposed and applied to two new UAV platforms as well as a specific light source (silicon-based golden LED). YOLO-Drone presents several novelties: 1) including a new backbone Darknet59; 2) a new complex feature aggregation module MSPP-FPN that incorporated one spatial pyramid pooling and three atrous spatial pyramid pooling modules; 3) and the use of Generalized Intersection over Union (GIoU) as the loss function. To evaluate performance, two benchmark datasets, UAVDT and VisDrone, along with one homemade dataset acquired at night under silicon-based golden LEDs, are utilized. The experimental results show that, in both UAVDT and VisDrone, the proposed YOLO-Drone outperforms state-of-the-art (SOTA) object detection methods by improving the mAP of 10.13% and 8.59%, respectively. With regards to UAVDT, the YOLO-Drone exhibits both high real-time inference speed of 53 FPS and a maximum mAP of 34.04%. Notably, YOLO-Drone achieves high performance under the silicon-based golden LEDs, with a mAP of up to 87.71%, surpassing the performance of YOLO series under ordinary light sources. To conclude, the proposed YOLO-Drone is a highly effective solution for object detection in UAV applications, particularly for night detection tasks where silicon-based golden light LED technology exhibits significant superiority.
PDF
点此查看论文截图
MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation
Authors:Jie Guo, Qimeng Wang, Yan Gao, Xiaolong Jiang, Xu Tang, Yao Hu, Baochang Zhang
CLIP (Contrastive Language-Image Pretraining) is well-developed for open-vocabulary zero-shot image-level recognition, while its applications in pixel-level tasks are less investigated, where most efforts directly adopt CLIP features without deliberative adaptations. In this work, we first demonstrate the necessity of image-pixel CLIP feature adaption, then provide Multi-View Prompt learning (MVP-SEG) as an effective solution to achieve image-pixel adaptation and to solve open-vocabulary semantic segmentation. Concretely, MVP-SEG deliberately learns multiple prompts trained by our Orthogonal Constraint Loss (OCLoss), by which each prompt is supervised to exploit CLIP feature on different object parts, and collaborative segmentation masks generated by all prompts promote better segmentation. Moreover, MVP-SEG introduces Global Prompt Refining (GPR) to further eliminate class-wise segmentation noise. Experiments show that the multi-view prompts learned from seen categories have strong generalization to unseen categories, and MVP-SEG+ which combines the knowledge transfer stage significantly outperforms previous methods on several benchmarks. Moreover, qualitative results justify that MVP-SEG does lead to better focus on different local parts.
PDF