Table of Contents
Fetching ...

GUIDE: Gaussian Unified Instance Detection for Enhanced Obstacle Perception in Autonomous Driving

Chunyong Hu, Qi Luo, Jianyun Xu, Song Wang, Qiang Li, Sheng Yang

TL;DR

GUIDE tackles the challenge of robust obstacle perception in autonomous driving by introducing a Gaussian-based, fully sparse framework that unifies instance detection, instance-level occupancy prediction, and tracking. Each object instance is represented by multiple 3D Gaussians, whose occupancy is inferred via Gaussian-to-Voxel splatting, while an instance bank enables temporal fusion and ID tracking without dense voxel grids. The approach achieves a notable improvement in instance occupancy mAP on nuScenes (≈21.6 for eight foreground categories, about 50% higher than prior SparseOcc methods) and delivers competitive detection and tracking performance with substantially lower memory usage. By allowing the voxel resolution to be adjusted at inference time and maintaining a memory-efficient, end-to-end pipeline, GUIDE demonstrates practical impact for scalable, real-time perception in diverse driving environments.

Abstract

In the realm of autonomous driving, accurately detecting surrounding obstacles is crucial for effective decision-making. Traditional methods primarily rely on 3D bounding boxes to represent these obstacles, which often fail to capture the complexity of irregularly shaped, real-world objects. To overcome these limitations, we present GUIDE, a novel framework that utilizes 3D Gaussians for instance detection and occupancy prediction. Unlike conventional occupancy prediction methods, GUIDE also offers robust tracking capabilities. Our framework employs a sparse representation strategy, using Gaussian-to-Voxel Splatting to provide fine-grained, instance-level occupancy data without the computational demands associated with dense voxel grids. Experimental validation on the nuScenes dataset demonstrates GUIDE's performance, with an instance occupancy mAP of 21.61, marking a 50\% improvement over existing methods, alongside competitive tracking capabilities. GUIDE establishes a new benchmark in autonomous perception systems, effectively combining precision with computational efficiency to better address the complexities of real-world driving environments.

GUIDE: Gaussian Unified Instance Detection for Enhanced Obstacle Perception in Autonomous Driving

TL;DR

GUIDE tackles the challenge of robust obstacle perception in autonomous driving by introducing a Gaussian-based, fully sparse framework that unifies instance detection, instance-level occupancy prediction, and tracking. Each object instance is represented by multiple 3D Gaussians, whose occupancy is inferred via Gaussian-to-Voxel splatting, while an instance bank enables temporal fusion and ID tracking without dense voxel grids. The approach achieves a notable improvement in instance occupancy mAP on nuScenes (≈21.6 for eight foreground categories, about 50% higher than prior SparseOcc methods) and delivers competitive detection and tracking performance with substantially lower memory usage. By allowing the voxel resolution to be adjusted at inference time and maintaining a memory-efficient, end-to-end pipeline, GUIDE demonstrates practical impact for scalable, real-time perception in diverse driving environments.

Abstract

In the realm of autonomous driving, accurately detecting surrounding obstacles is crucial for effective decision-making. Traditional methods primarily rely on 3D bounding boxes to represent these obstacles, which often fail to capture the complexity of irregularly shaped, real-world objects. To overcome these limitations, we present GUIDE, a novel framework that utilizes 3D Gaussians for instance detection and occupancy prediction. Unlike conventional occupancy prediction methods, GUIDE also offers robust tracking capabilities. Our framework employs a sparse representation strategy, using Gaussian-to-Voxel Splatting to provide fine-grained, instance-level occupancy data without the computational demands associated with dense voxel grids. Experimental validation on the nuScenes dataset demonstrates GUIDE's performance, with an instance occupancy mAP of 21.61, marking a 50\% improvement over existing methods, alongside competitive tracking capabilities. GUIDE establishes a new benchmark in autonomous perception systems, effectively combining precision with computational efficiency to better address the complexities of real-world driving environments.

Paper Structure

This paper contains 33 sections, 6 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Framework of our GUIDE. Instance queries and their anchors are subsequently initialized and iteratively updated through interactions with image features using the instance decoder. The updated top-k instances are combined with those in the historical instance bank to form a new candidate instance set. Each instance is then associated with multiple 3D Gaussians, which serve as their representations. These Gaussians are refined iteratively through a 5-layer Gaussian Decoder. Subsequently, instance occupancy predictions are generated via Gaussian-to-Voxel Splatting. And aggregating Gaussian features allows reconstruction of instance-level representations to predict each instance's bounding box and category. Additionally, the top-k instances update the instance bank, adding temporal information to aid inference for later frames. Meanwhile, we assign unique IDs to instances whose confidence scores exceed a predefined threshold in the instance bank for instance tracking across frames.
  • Figure 2: Visualization results for 3D instance occupancy prediction on nuScenes. The different colors are used to represent the occupancy predictions for various instances. The background points are included to aid scene comprehension and are not utilized during the model inference process.
  • Figure 3: Visualization for occupancy with different voxel sizes when processing the Gaussian-to-Voxel Splatting.
  • Figure 4: Visualization for 3D instance occupancy prediction with different number of Gaussians for each instance.
  • Figure 5: More Visualization results for 3D instance occupancy prediction on nuScenes. The different colors are used to represent the occupancy predictions for various instances. The background points are included to aid scene comprehension and are not utilized during the model inference process.