Table of Contents
Fetching ...

GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting

Wanshui Gan, Fang Liu, Hongbin Xu, Ningkai Mo, Naoto Yokoya

TL;DR

GaussianOcc addresses the challenge of self-supervised surround-view 3D occupancy estimation without ground-truth poses. It introduces two Gaussian splatting innovations: GSP for cross-view scale learning and GSV for fast voxel-space rendering, enabling a two-stage training that yields competitive occupancy and depth results with substantial efficiency gains. The method achieves state-of-the-art self-supervised occupancy performance on nuScenes, demonstrates 3D occupancy on DDAD, and reduces both training and rendering costs significantly. Together, these contributions offer a practical, scalable solution for real-world surround-view perception under weak supervision.

Abstract

We introduce GaussianOcc, a systematic method that investigates the two usages of Gaussian splatting for fully self-supervised and efficient 3D occupancy estimation in surround views. First, traditional methods for self-supervised 3D occupancy estimation still require ground truth 6D poses from sensors during training. To address this limitation, we propose Gaussian Splatting for Projection (GSP) module to provide accurate scale information for fully self-supervised training from adjacent view projection. Additionally, existing methods rely on volume rendering for final 3D voxel representation learning using 2D signals (depth maps, semantic maps), which is both time-consuming and less effective. We propose Gaussian Splatting from Voxel space (GSV) to leverage the fast rendering properties of Gaussian splatting. As a result, the proposed GaussianOcc method enables fully self-supervised (no ground truth pose) 3D occupancy estimation in competitive performance with low computational cost (2.7 times faster in training and 5 times faster in rendering). The relevant code is available in https://github.com/GANWANSHUI/GaussianOcc.git.

GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting

TL;DR

GaussianOcc addresses the challenge of self-supervised surround-view 3D occupancy estimation without ground-truth poses. It introduces two Gaussian splatting innovations: GSP for cross-view scale learning and GSV for fast voxel-space rendering, enabling a two-stage training that yields competitive occupancy and depth results with substantial efficiency gains. The method achieves state-of-the-art self-supervised occupancy performance on nuScenes, demonstrates 3D occupancy on DDAD, and reduces both training and rendering costs significantly. Together, these contributions offer a practical, scalable solution for real-world surround-view perception under weak supervision.

Abstract

We introduce GaussianOcc, a systematic method that investigates the two usages of Gaussian splatting for fully self-supervised and efficient 3D occupancy estimation in surround views. First, traditional methods for self-supervised 3D occupancy estimation still require ground truth 6D poses from sensors during training. To address this limitation, we propose Gaussian Splatting for Projection (GSP) module to provide accurate scale information for fully self-supervised training from adjacent view projection. Additionally, existing methods rely on volume rendering for final 3D voxel representation learning using 2D signals (depth maps, semantic maps), which is both time-consuming and less effective. We propose Gaussian Splatting from Voxel space (GSV) to leverage the fast rendering properties of Gaussian splatting. As a result, the proposed GaussianOcc method enables fully self-supervised (no ground truth pose) 3D occupancy estimation in competitive performance with low computational cost (2.7 times faster in training and 5 times faster in rendering). The relevant code is available in https://github.com/GANWANSHUI/GaussianOcc.git.
Paper Structure (18 sections, 5 equations, 10 figures, 11 tables)

This paper contains 18 sections, 5 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Problem setting of GaussianOcc. Given a surround image sequence, the spatial camera extrinsic and its corresponding 2D semantic annotation, GaussianOcc is able to perform 3D occupancy estimation without the need for ground truth occupancy label and ground truth 6D ego pose for training.
  • Figure 2:
  • Figure 3: Overlap mask in nuScenes nuscenes and DDAD ddad.
  • Figure 4: Visualization of the render depth map and 3D occupancy prediction on the nuScenes and DDAD datasets.
  • Figure 5: The comparison of the depth map and its synthesis overlap image with (1) direct bilinear interpolation cross-view synthesis surrounddepth and (2) our cross-view Gaussian splatting synthesis.
  • ...and 5 more figures