GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction
Yuanhui Huang, Amonnut Thammatadatrakoon, Wenzhao Zheng, Yunpeng Zhang, Dalong Du, Jiwen Lu
TL;DR
GaussianFormer-2 tackles the inefficiency of dense 3D occupancy representations by introducing a probabilistic Gaussian superposition that treats each Gaussian as a neighborhood occupancy distribution and fuses geometry via multiplicative probability. Semantics are derived with a normalized Gaussian Mixture Model, preventing overlapping and unbounded logits. A distribution-based initialization learns pixel-aligned occupancy distributions along camera rays to place Gaussians around occupied regions without LiDAR depth supervision. The approach achieves state-of-the-art results on nuScenes and KITTI-360 while using far fewer Gaussians, demonstrating both high accuracy and improved efficiency for vision-centric 3D scene understanding in autonomous driving.
Abstract
3D semantic occupancy prediction is an important task for robust vision-centric autonomous driving, which predicts fine-grained geometry and semantics of the surrounding scene. Most existing methods leverage dense grid-based scene representations, overlooking the spatial sparsity of the driving scenes. Although 3D semantic Gaussian serves as an object-centric sparse alternative, most of the Gaussians still describe the empty region with low efficiency. To address this, we propose a probabilistic Gaussian superposition model which interprets each Gaussian as a probability distribution of its neighborhood being occupied and conforms to probabilistic multiplication to derive the overall geometry. Furthermore, we adopt the exact Gaussian mixture model for semantics calculation to avoid unnecessary overlapping of Gaussians. To effectively initialize Gaussians in non-empty region, we design a distribution-based initialization module which learns the pixel-aligned occupancy distribution instead of the depth of surfaces. We conduct extensive experiments on nuScenes and KITTI-360 datasets and our GaussianFormer-2 achieves state-of-the-art performance with high efficiency. Code: https://github.com/huang-yh/GaussianFormer.
