Table of Contents
Fetching ...

GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction

Yuanhui Huang, Amonnut Thammatadatrakoon, Wenzhao Zheng, Yunpeng Zhang, Dalong Du, Jiwen Lu

TL;DR

GaussianFormer-2 tackles the inefficiency of dense 3D occupancy representations by introducing a probabilistic Gaussian superposition that treats each Gaussian as a neighborhood occupancy distribution and fuses geometry via multiplicative probability. Semantics are derived with a normalized Gaussian Mixture Model, preventing overlapping and unbounded logits. A distribution-based initialization learns pixel-aligned occupancy distributions along camera rays to place Gaussians around occupied regions without LiDAR depth supervision. The approach achieves state-of-the-art results on nuScenes and KITTI-360 while using far fewer Gaussians, demonstrating both high accuracy and improved efficiency for vision-centric 3D scene understanding in autonomous driving.

Abstract

3D semantic occupancy prediction is an important task for robust vision-centric autonomous driving, which predicts fine-grained geometry and semantics of the surrounding scene. Most existing methods leverage dense grid-based scene representations, overlooking the spatial sparsity of the driving scenes. Although 3D semantic Gaussian serves as an object-centric sparse alternative, most of the Gaussians still describe the empty region with low efficiency. To address this, we propose a probabilistic Gaussian superposition model which interprets each Gaussian as a probability distribution of its neighborhood being occupied and conforms to probabilistic multiplication to derive the overall geometry. Furthermore, we adopt the exact Gaussian mixture model for semantics calculation to avoid unnecessary overlapping of Gaussians. To effectively initialize Gaussians in non-empty region, we design a distribution-based initialization module which learns the pixel-aligned occupancy distribution instead of the depth of surfaces. We conduct extensive experiments on nuScenes and KITTI-360 datasets and our GaussianFormer-2 achieves state-of-the-art performance with high efficiency. Code: https://github.com/huang-yh/GaussianFormer.

GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction

TL;DR

GaussianFormer-2 tackles the inefficiency of dense 3D occupancy representations by introducing a probabilistic Gaussian superposition that treats each Gaussian as a neighborhood occupancy distribution and fuses geometry via multiplicative probability. Semantics are derived with a normalized Gaussian Mixture Model, preventing overlapping and unbounded logits. A distribution-based initialization learns pixel-aligned occupancy distributions along camera rays to place Gaussians around occupied regions without LiDAR depth supervision. The approach achieves state-of-the-art results on nuScenes and KITTI-360 while using far fewer Gaussians, demonstrating both high accuracy and improved efficiency for vision-centric 3D scene understanding in autonomous driving.

Abstract

3D semantic occupancy prediction is an important task for robust vision-centric autonomous driving, which predicts fine-grained geometry and semantics of the surrounding scene. Most existing methods leverage dense grid-based scene representations, overlooking the spatial sparsity of the driving scenes. Although 3D semantic Gaussian serves as an object-centric sparse alternative, most of the Gaussians still describe the empty region with low efficiency. To address this, we propose a probabilistic Gaussian superposition model which interprets each Gaussian as a probability distribution of its neighborhood being occupied and conforms to probabilistic multiplication to derive the overall geometry. Furthermore, we adopt the exact Gaussian mixture model for semantics calculation to avoid unnecessary overlapping of Gaussians. To effectively initialize Gaussians in non-empty region, we design a distribution-based initialization module which learns the pixel-aligned occupancy distribution instead of the depth of surfaces. We conduct extensive experiments on nuScenes and KITTI-360 datasets and our GaussianFormer-2 achieves state-of-the-art performance with high efficiency. Code: https://github.com/huang-yh/GaussianFormer.

Paper Structure

This paper contains 16 sections, 21 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: We approach efficient object-centric scene representation from a probabilistic perspective and propose the probabilistic Gaussian superposition model, which achieves SOTA performance with as few as 8.9% of Gaussians in GaussianFormer huang2024gaussian.
  • Figure 2: Representation comparisons. Voxel and plane based representations inevitably incorporate emptiness when modeling the 3D scene. While GaussianFormer huang2024gaussian proposes 3D semantic Gaussian as a sparse representation, it still suffer from spatial redundancy. Our method achieves true object-centricity through probabilistic modeling.
  • Figure 3: Overall pipeline of our method. To achieve probabilistic modeling, we decompose occupancy prediction into geometry and semantics prediction, and approach them separately using probabilistic multiplication and Gaussian mixture model to improve efficiency.
  • Figure 4: Distribution-based initialization. Our initialization scheme learns pixel-aligned occupancy distributions from occupancy annotation, while the depth-based counterpart only captures the surfaces of objects and relies on LiDAR supervision.
  • Figure 5: Gaussian and occupancy visualizations on nuScenes. Our model is able to predict both comprehensive and realistic 3D Gaussians and occupancy.
  • ...and 4 more figures