2024-08-29 更新
vFusedSeg3D: 3rd Place Solution for 2024 Waymo Open Dataset Challenge in Semantic Segmentation
Authors:Osama Amjad, Ammad Nadeem
In this technical study, we introduce VFusedSeg3D, an innovative multi-modal fusion system created by the VisionRD team that combines camera and LiDAR data to significantly enhance the accuracy of 3D perception. VFusedSeg3D uses the rich semantic content of the camera pictures and the accurate depth sensing of LiDAR to generate a strong and comprehensive environmental understanding, addressing the constraints inherent in each modality. Through a carefully thought-out network architecture that aligns and merges these information at different stages, our novel feature fusion technique combines geometric features from LiDAR point clouds with semantic features from camera images. With the use of multi-modality techniques, performance has significantly improved, yielding a state-of-the-art mIoU of 72.46% on the validation set as opposed to the prior 70.51%.VFusedSeg3D sets a new benchmark in 3D segmentation accuracy. making it an ideal solution for applications requiring precise environmental perception.
PDF
点此查看论文截图
Handling Geometric Domain Shifts in Semantic Segmentation of Surgical RGB and Hyperspectral Images
Authors:Silvia Seidlitz, Jan Sellner, Alexander Studier-Fischer, Alessandro Motta, Berkin Özdemir, Beat P. Müller-Stich, Felix Nickel, Lena Maier-Hein
Robust semantic segmentation of intraoperative image data holds promise for enabling automatic surgical scene understanding and autonomous robotic surgery. While model development and validation are primarily conducted on idealistic scenes, geometric domain shifts, such as occlusions of the situs, are common in real-world open surgeries. To close this gap, we (1) present the first analysis of state-of-the-art (SOA) semantic segmentation models when faced with geometric out-of-distribution (OOD) data, and (2) propose an augmentation technique called “Organ Transplantation”, to enhance generalizability. Our comprehensive validation on six different OOD datasets, comprising 600 RGB and hyperspectral imaging (HSI) cubes from 33 pigs, each annotated with 19 classes, reveals a large performance drop in SOA organ segmentation models on geometric OOD data. This performance decline is observed not only in conventional RGB data (with a dice similarity coefficient (DSC) drop of 46 %) but also in HSI data (with a DSC drop of 45 %), despite the richer spectral information content. The performance decline increases with the spatial granularity of the input data. Our augmentation technique improves SOA model performance by up to 67 % for RGB data and 90 % for HSI data, achieving performance at the level of in-distribution performance on real OOD test data. Given the simplicity and effectiveness of our augmentation method, it is a valuable tool for addressing geometric domain shifts in surgical scene segmentation, regardless of the underlying model. Our code and pre-trained models are publicly available at https://github.com/IMSY-DKFZ/htc.
PDF Silvia Seidlitz and Jan Sellner contributed equally
点此查看论文截图
Transfer Learning from Simulated to Real Scenes for Monocular 3D Object Detection
Authors:Sondos Mohamed, Walter Zimmer, Ross Greer, Ahmed Alaaeldin Ghita, Modesto Castrillón-Santana, Mohan Trivedi, Alois Knoll, Salvatore Mario Carta, Mirko Marras
Accurately detecting 3D objects from monocular images in dynamic roadside scenarios remains a challenging problem due to varying camera perspectives and unpredictable scene conditions. This paper introduces a two-stage training strategy to address these challenges. Our approach initially trains a model on the large-scale synthetic dataset, RoadSense3D, which offers a diverse range of scenarios for robust feature learning. Subsequently, we fine-tune the model on a combination of real-world datasets to enhance its adaptability to practical conditions. Experimental results of the Cube R-CNN model on challenging public benchmarks show a remarkable improvement in detection performance, with a mean average precision rising from 0.26 to 12.76 on the TUM Traffic A9 Highway dataset and from 2.09 to 6.60 on the DAIR-V2X-I dataset when performing transfer learning. Code, data, and qualitative video results are available on the project website: https://roadsense3d.github.io.
PDF 18 pages. Accepted for ECVA European Conference on Computer Vision 2024 (ECCV’24)
点此查看论文截图
InstanSeg: an embedding-based instance segmentation algorithm optimized for accurate, efficient and portable cell segmentation
Authors:Thibaut Goldsborough, Ben Philps, Alan O’Callaghan, Fiona Inglis, Leo Leplat, Andrew Filby, Hakan Bilen, Peter Bankhead
Cell and nucleus segmentation are fundamental tasks for quantitative bioimage analysis. Despite progress in recent years, biologists and other domain experts still require novel algorithms to handle increasingly large and complex real-world datasets. These algorithms must not only achieve state-of-the-art accuracy, but also be optimized for efficiency, portability and user-friendliness. Here, we introduce InstanSeg: a novel embedding-based instance segmentation pipeline designed to identify cells and nuclei in microscopy images. Using six public cell segmentation datasets, we demonstrate that InstanSeg can significantly improve accuracy when compared to the most widely used alternative methods, while reducing the processing time by at least 60%. Furthermore, InstanSeg is designed to be fully serializable as TorchScript and supports GPU acceleration on a range of hardware. We provide an open-source implementation of InstanSeg in Python, in addition to a user-friendly, interactive QuPath extension for inference written in Java. Our code and pre-trained models are available at https://github.com/instanseg/instanseg .
PDF 12 pages,6 figures