Table of Contents
Fetching ...

SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving

Georg Hess, Carl Lindström, Maryam Fatemi, Christoffer Petersson, Lennart Svensson

TL;DR

SplatAD tackles the challenge of scalable, realistic autonomous driving simulation by unifying camera and lidar rendering within a single 3D Gaussian Splatting framework. It introduces sensor-specific adaptations, including rolling shutter compensation and lidar tiling in spherical coordinates, realized through CUDA-accelerated rasterization for real-time performance. The approach achieves state-of-the-art image and lidar rendering quality across multiple automotive datasets while delivering substantial speedups over NeRF-based methods. This work significantly enhances the practicality of data-driven simulation for safety testing and scenario exploration in autonomous driving.

Abstract

Ensuring the safety of autonomous robots, such as self-driving vehicles, requires extensive testing across diverse driving scenarios. Simulation is a key ingredient for conducting such testing in a cost-effective and scalable way. Neural rendering methods have gained popularity, as they can build simulation environments from collected logs in a data-driven manner. However, existing neural radiance field (NeRF) methods for sensor-realistic rendering of camera and lidar data suffer from low rendering speeds, limiting their applicability for large-scale testing. While 3D Gaussian Splatting (3DGS) enables real-time rendering, current methods are limited to camera data and are unable to render lidar data essential for autonomous driving. To address these limitations, we propose SplatAD, the first 3DGS-based method for realistic, real-time rendering of dynamic scenes for both camera and lidar data. SplatAD accurately models key sensor-specific phenomena such as rolling shutter effects, lidar intensity, and lidar ray dropouts, using purpose-built algorithms to optimize rendering efficiency. Evaluation across three autonomous driving datasets demonstrates that SplatAD achieves state-of-the-art rendering quality with up to +2 PSNR for NVS and +3 PSNR for reconstruction while increasing rendering speed over NeRF-based methods by an order of magnitude. See https://research.zenseact.com/publications/splatad/ for our project page.

SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving

TL;DR

SplatAD tackles the challenge of scalable, realistic autonomous driving simulation by unifying camera and lidar rendering within a single 3D Gaussian Splatting framework. It introduces sensor-specific adaptations, including rolling shutter compensation and lidar tiling in spherical coordinates, realized through CUDA-accelerated rasterization for real-time performance. The approach achieves state-of-the-art image and lidar rendering quality across multiple automotive datasets while delivering substantial speedups over NeRF-based methods. This work significantly enhances the practicality of data-driven simulation for safety testing and scenario exploration in autonomous driving.

Abstract

Ensuring the safety of autonomous robots, such as self-driving vehicles, requires extensive testing across diverse driving scenarios. Simulation is a key ingredient for conducting such testing in a cost-effective and scalable way. Neural rendering methods have gained popularity, as they can build simulation environments from collected logs in a data-driven manner. However, existing neural radiance field (NeRF) methods for sensor-realistic rendering of camera and lidar data suffer from low rendering speeds, limiting their applicability for large-scale testing. While 3D Gaussian Splatting (3DGS) enables real-time rendering, current methods are limited to camera data and are unable to render lidar data essential for autonomous driving. To address these limitations, we propose SplatAD, the first 3DGS-based method for realistic, real-time rendering of dynamic scenes for both camera and lidar data. SplatAD accurately models key sensor-specific phenomena such as rolling shutter effects, lidar intensity, and lidar ray dropouts, using purpose-built algorithms to optimize rendering efficiency. Evaluation across three autonomous driving datasets demonstrates that SplatAD achieves state-of-the-art rendering quality with up to +2 PSNR for NVS and +3 PSNR for reconstruction while increasing rendering speed over NeRF-based methods by an order of magnitude. See https://research.zenseact.com/publications/splatad/ for our project page.

Paper Structure

This paper contains 20 sections, 23 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: SplatAD is the first method capable of realistic camera and lidar rendering using 3D Gaussian Splatting. Whereas previous methods are either fast or multi-modal, SplatAD enables real-time, high-quality rendering for both camera and lidar. In addition, SplatAD can reach competitive performance, e.g., PSNR for image and Chamfer distance for point cloud, within minutes.
  • Figure 2: Overview of our proposed method. Given the composition of static and dynamic 3D Gaussians, SplatAD is capable of differentiable rendering of both lidar and camera data. Our proposed lidar rendering matches the image rendering on a high level, but modifies each component to accurately model sensor characteristics. Our method projects 3D Gaussians with associated feature vectors onto the corresponding sensor modalities (camera and lidar) and employs sensor-specific tiling to match their distinct characteristics. During rasterization, the projected Gaussians are corrected for rolling shutter effects caused by the movement of sensors and, potentially, their own velocity. Finally, the rasterized features are decoded into the respective image and lidar point cloud representations.
  • Figure 3: Compared to the baselines, SplatAD produces sharp images with a high level of detail. Further, the bottom row highlights the superiority of our lidar rendering. Projecting lidar points into images for depth supervision, as used by previous 3DGS methods, causes line-of-sight errors and incorrect volume carving due to the pose differences between camera and lidar.
  • Figure 4: Removing our rolling shutter modeling compensation leads to inaccurate geometries and inconsistencies in the learning.
  • Figure 5: The CNN decoder improves sharpness and is more true to color than the MLP decoder.
  • ...and 5 more figures