Table of Contents
Fetching ...

OG-Gaussian: Occupancy Based Street Gaussians for Autonomous Driving

Yedong Shen, Xinran Zhang, Yifan Duan, Shiqi Zhang, Heng Li, Yilong Wu, Jianmin Ji, Yanyong Zhang

TL;DR

The paper tackles high-fidelity 3D scene reconstruction for autonomous driving using camera-only inputs. It introduces OG-Gaussian, which builds Occupancy Grids from surround-view images via an Occupancy Prediction Network, separates static background from dynamic vehicles, and learns dynamic trajectories without manual annotations, rendering with Gaussian Splatting. The approach achieves a PSNR of $35.13$ dB and renders at $143$ FPS on the Waymo Open Dataset, matching LiDAR-based SOTA while reducing cost and complexity. This work enables fast, low-cost reconstruction suitable for simulation and planning in autonomous driving.

Abstract

Accurate and realistic 3D scene reconstruction enables the lifelike creation of autonomous driving simulation environments. With advancements in 3D Gaussian Splatting (3DGS), previous studies have applied it to reconstruct complex dynamic driving scenes. These methods typically require expensive LiDAR sensors and pre-annotated datasets of dynamic objects. To address these challenges, we propose OG-Gaussian, a novel approach that replaces LiDAR point clouds with Occupancy Grids (OGs) generated from surround-view camera images using Occupancy Prediction Network (ONet). Our method leverages the semantic information in OGs to separate dynamic vehicles from static street background, converting these grids into two distinct sets of initial point clouds for reconstructing both static and dynamic objects. Additionally, we estimate the trajectories and poses of dynamic objects through a learning-based approach, eliminating the need for complex manual annotations. Experiments on Waymo Open dataset demonstrate that OG-Gaussian is on par with the current state-of-the-art in terms of reconstruction quality and rendering speed, achieving an average PSNR of 35.13 and a rendering speed of 143 FPS, while significantly reducing computational costs and economic overhead.

OG-Gaussian: Occupancy Based Street Gaussians for Autonomous Driving

TL;DR

The paper tackles high-fidelity 3D scene reconstruction for autonomous driving using camera-only inputs. It introduces OG-Gaussian, which builds Occupancy Grids from surround-view images via an Occupancy Prediction Network, separates static background from dynamic vehicles, and learns dynamic trajectories without manual annotations, rendering with Gaussian Splatting. The approach achieves a PSNR of dB and renders at FPS on the Waymo Open Dataset, matching LiDAR-based SOTA while reducing cost and complexity. This work enables fast, low-cost reconstruction suitable for simulation and planning in autonomous driving.

Abstract

Accurate and realistic 3D scene reconstruction enables the lifelike creation of autonomous driving simulation environments. With advancements in 3D Gaussian Splatting (3DGS), previous studies have applied it to reconstruct complex dynamic driving scenes. These methods typically require expensive LiDAR sensors and pre-annotated datasets of dynamic objects. To address these challenges, we propose OG-Gaussian, a novel approach that replaces LiDAR point clouds with Occupancy Grids (OGs) generated from surround-view camera images using Occupancy Prediction Network (ONet). Our method leverages the semantic information in OGs to separate dynamic vehicles from static street background, converting these grids into two distinct sets of initial point clouds for reconstructing both static and dynamic objects. Additionally, we estimate the trajectories and poses of dynamic objects through a learning-based approach, eliminating the need for complex manual annotations. Experiments on Waymo Open dataset demonstrate that OG-Gaussian is on par with the current state-of-the-art in terms of reconstruction quality and rendering speed, achieving an average PSNR of 35.13 and a rendering speed of 143 FPS, while significantly reducing computational costs and economic overhead.

Paper Structure

This paper contains 12 sections, 11 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: OG-Gaussian reconstruction example. (a) and (b) visualize the performance of our method in reconstructing distant dynamic objects and the scene under rainy and low-light conditions. (c) and (d) present the reconstruction results of original 3DGS from the same viewpoint. Red bounding boxes indicate the location of dynamic objects in the ground truth.
  • Figure 2: Overview of OG-Gaussian. OG-Gaussian utilizes a trained 3D Occupancy Prediction network to obtain Occupancy Grid data for the scene. It separates static and dynamic objects into different initial point cloud models using semantic information. After the separated reconstruction, we globally render both static and dynamic objects, producing 3D scenes, depth maps and so on.
  • Figure 3: Initial point cloud generation process. We extract dynamic vehicles from the street scene, then upsample and project them to obtain dense, colorized dynamic vehicle point cloud prior. The street scene can be converted into an initial background point clouds directly.
  • Figure 4: Qualitative comparison of different reconstruction methods. Each column in the figure represents the qualitative results of different reconstruction methods, with each row showing the same viewpoint.
  • Figure 5: Visual ablation results for specific scenes. The results of different initialization methods.