Table of Contents
Fetching ...

Industrial-Grade Sensor Simulation via Gaussian Splatting: A Modular Framework for Scalable Editing and Full-Stack Validation

Xianming Zeng, Sicong Du, Qifeng Chen, Lizhe Liu, Haoyu Shu, Jiaxuan Gao, Jiarun Liu, Jiulong Xu, Jianyun Xu, Mingxia Chen, Yiru Zhao, Peng Chen, Yapeng Xue, Chunming Zhao, Sheng Yang, Qiang Li

TL;DR

This work addresses the need for scalable, realistic sensor simulation in autonomous driving by replacing NeRF-based pipelines with Gaussian Splatting (GS), which provides explicit scene parameterization and real-time rendering. It introduces a modular GS-based framework consisting of (1) a 2D Gaussian scene representation with parallelized training, (2) an explicit scene-editing pipeline for object insertion and harmonization, and (3) diffusion-model–driven scene expansion to generate physically coherent, new viewpoints. The approach is validated on a proprietary multi-sensor dataset (cameras and LiDAR), with ablations showing improved rendering fidelity, reduced latency, and interpretable editing capabilities, plus improved perception performance when augmented data is used for training. Finally, the framework is demonstrated in full-stack validation by coupling with traffic and dynamic simulators to perform sim-to-real testing, closing the loop from perception to planning and control and enabling robust end-to-end autonomy validation.

Abstract

Sensor simulation is pivotal for scalable validation of autonomous driving systems, yet existing Neural Radiance Fields (NeRF) based methods face applicability and efficiency challenges in industrial workflows. This paper introduces a Gaussian Splatting (GS) based system to address these challenges: We first break down sensor simulator components and analyze the possible advantages of GS over NeRF. Then in practice, we refactor three crucial components through GS, to leverage its explicit scene representation and real-time rendering: (1) choosing the 2D neural Gaussian representation for physics-compliant scene and sensor modeling, (2) proposing a scene editing pipeline to leverage Gaussian primitives library for data augmentation, and (3) coupling a controllable diffusion model for scene expansion and harmonization. We implement this framework on a proprietary autonomous driving dataset supporting cameras and LiDAR sensors. We demonstrate through ablation studies that our approach reduces frame-wise simulation latency, achieves better geometric and photometric consistency, and enables interpretable explicit scene editing and expansion. Furthermore, we showcase how integrating such a GS-based sensor simulator with traffic and dynamic simulators enables full-stack testing of end-to-end autonomy algorithms. Our work provides both algorithmic insights and practical validation, establishing GS as a cornerstone for industrial-grade sensor simulation.

Industrial-Grade Sensor Simulation via Gaussian Splatting: A Modular Framework for Scalable Editing and Full-Stack Validation

TL;DR

This work addresses the need for scalable, realistic sensor simulation in autonomous driving by replacing NeRF-based pipelines with Gaussian Splatting (GS), which provides explicit scene parameterization and real-time rendering. It introduces a modular GS-based framework consisting of (1) a 2D Gaussian scene representation with parallelized training, (2) an explicit scene-editing pipeline for object insertion and harmonization, and (3) diffusion-model–driven scene expansion to generate physically coherent, new viewpoints. The approach is validated on a proprietary multi-sensor dataset (cameras and LiDAR), with ablations showing improved rendering fidelity, reduced latency, and interpretable editing capabilities, plus improved perception performance when augmented data is used for training. Finally, the framework is demonstrated in full-stack validation by coupling with traffic and dynamic simulators to perform sim-to-real testing, closing the loop from perception to planning and control and enabling robust end-to-end autonomy validation.

Abstract

Sensor simulation is pivotal for scalable validation of autonomous driving systems, yet existing Neural Radiance Fields (NeRF) based methods face applicability and efficiency challenges in industrial workflows. This paper introduces a Gaussian Splatting (GS) based system to address these challenges: We first break down sensor simulator components and analyze the possible advantages of GS over NeRF. Then in practice, we refactor three crucial components through GS, to leverage its explicit scene representation and real-time rendering: (1) choosing the 2D neural Gaussian representation for physics-compliant scene and sensor modeling, (2) proposing a scene editing pipeline to leverage Gaussian primitives library for data augmentation, and (3) coupling a controllable diffusion model for scene expansion and harmonization. We implement this framework on a proprietary autonomous driving dataset supporting cameras and LiDAR sensors. We demonstrate through ablation studies that our approach reduces frame-wise simulation latency, achieves better geometric and photometric consistency, and enables interpretable explicit scene editing and expansion. Furthermore, we showcase how integrating such a GS-based sensor simulator with traffic and dynamic simulators enables full-stack testing of end-to-end autonomy algorithms. Our work provides both algorithmic insights and practical validation, establishing GS as a cornerstone for industrial-grade sensor simulation.

Paper Structure

This paper contains 10 sections, 7 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Key refactorable components (green) and applications (blue) of the sensor simulator where GS are leveraged to improve performance.
  • Figure 2: The workflow of our proposed GS-based sensor simulator. Given an input driving clip, we first reconstruct various sensors into their corresponding neural Gaussians, where editing, expansion, and harmonization operations can be applied to manipulate all reconstructed Gaussians synchronously. In application phases, it can either be used to enlarge the training set of multi-type perception tasks, or link with other simulators for end-to-end testing.
  • Figure 3: We use a unified projection scheme to consistently process pinhole frames (left), fisheye frames (middle), and range-view LiDAR frames (right).
  • Figure 4: Representative GS objects rendered from different perspectives.
  • Figure 5: Qualitative comparisons of dragging-in reconstructed models (middle) and using a camera diffusion model to refine their harmony (right).
  • ...and 3 more figures