GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting
Wanshui Gan, Fang Liu, Hongbin Xu, Ningkai Mo, Naoto Yokoya
TL;DR
GaussianOcc addresses the challenge of self-supervised surround-view 3D occupancy estimation without ground-truth poses. It introduces two Gaussian splatting innovations: GSP for cross-view scale learning and GSV for fast voxel-space rendering, enabling a two-stage training that yields competitive occupancy and depth results with substantial efficiency gains. The method achieves state-of-the-art self-supervised occupancy performance on nuScenes, demonstrates 3D occupancy on DDAD, and reduces both training and rendering costs significantly. Together, these contributions offer a practical, scalable solution for real-world surround-view perception under weak supervision.
Abstract
We introduce GaussianOcc, a systematic method that investigates the two usages of Gaussian splatting for fully self-supervised and efficient 3D occupancy estimation in surround views. First, traditional methods for self-supervised 3D occupancy estimation still require ground truth 6D poses from sensors during training. To address this limitation, we propose Gaussian Splatting for Projection (GSP) module to provide accurate scale information for fully self-supervised training from adjacent view projection. Additionally, existing methods rely on volume rendering for final 3D voxel representation learning using 2D signals (depth maps, semantic maps), which is both time-consuming and less effective. We propose Gaussian Splatting from Voxel space (GSV) to leverage the fast rendering properties of Gaussian splatting. As a result, the proposed GaussianOcc method enables fully self-supervised (no ground truth pose) 3D occupancy estimation in competitive performance with low computational cost (2.7 times faster in training and 5 times faster in rendering). The relevant code is available in https://github.com/GANWANSHUI/GaussianOcc.git.
