2022-10-25 更新
A Flexible-Frame-Rate Vision-Aided Inertial Object Tracking System for Mobile Devices
Authors:Yo-Chung Lau, Kuan-Wei Tseng, I-Ju Hsieh, Hsiao-Ching Tseng, Yi-Ping Hung
Real-time object pose estimation and tracking is challenging but essential for emerging augmented reality (AR) applications. In general, state-of-the-art methods address this problem using deep neural networks which indeed yield satisfactory results. Nevertheless, the high computational cost of these methods makes them unsuitable for mobile devices where real-world applications usually take place. In addition, head-mounted displays such as AR glasses require at least 90~FPS to avoid motion sickness, which further complicates the problem. We propose a flexible-frame-rate object pose estimation and tracking system for mobile devices. It is a monocular visual-inertial-based system with a client-server architecture. Inertial measurement unit (IMU) pose propagation is performed on the client side for high speed tracking, and RGB image-based 3D pose estimation is performed on the server side to obtain accurate poses, after which the pose is sent to the client side for visual-inertial fusion, where we propose a bias self-correction mechanism to reduce drift. We also propose a pose inspection algorithm to detect tracking failures and incorrect pose estimation. Connected by high-speed networking, our system supports flexible frame rates up to 120 FPS and guarantees high precision and real-time tracking on low-end devices. Both simulations and real world experiments show that our method achieves accurate and robust object tracking.
PDF
点此查看论文截图
1st Place Solution of The Robust Vision Challenge (RVC) 2022 Semantic Segmentation Track
Authors:Junfei Xiao, Zhichao Xu, Shiyi Lan, Zhiding Yu, Alan Yuille, Anima Anandkumar
This report describes the winner solution to the semantic segmentation task of the Robust Vision Challenge on ECCV 2022. Our method adopts the FAN-B-Hybrid model as the encoder and uses Segformer as the segmentation framework. The model is trained on a combined dataset containing images from 9 datasets (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, Wilddash2, IDD, BDD, and COCO) with a simple dataset balancing strategy. All the original labels are projected to a 256-class unified label space, and the model is trained with naive cross-entropy loss. Without significant hyperparameters tuning or any specific loss weighting, our solution ranks 1st on all the required semantic segmentation benchmarks from multiple domains (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, and Wilddash2). Our method could be served as a strong baseline for the multi-domain segmentation task and our codebase could be helpful to future work. Code will be available at https://github.com/lambert-x/RVC_Segmentation.
PDF Winner Solution to The Robust Vision Challenge 2022 Semantic Segmentation Track