Table of Contents
Fetching ...

TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving

Cheng Zhao, Su Sun, Ruoyu Wang, Yuliang Guo, Jun-Jun Wan, Zhou Huang, Xinyu Huang, Yingjie Victor Chen, Liu Ren

TL;DR

This work tackles fast, high-fidelity 3D reconstruction and novel-view rendering in urban driving by fusing LiDAR and camera data. It introduces TCLC-GS, a tightly coupled LiDAR-Camera Gaussian Splatting framework that uses a hybrid explicit (colorized 3D mesh) and implicit (hierarchical octree features) representation to initialize and optimize 3D Gaussians, with dense depth supervision rendered from the mesh. The method achieves state-of-the-art image and depth synthesis on Waymo and nuScenes while maintaining real-time performance (about 90–120 FPS on a single RTX 3090 Ti). Key contributions include the colorized mesh-guided Gaussian initialization, the octree implicit feature grid for appearance context, and dense mesh-based depth supervision that together enable robust large-scale urban reconstruction suitable for autonomous driving applications.

Abstract

Most 3D Gaussian Splatting (3D-GS) based methods for urban scenes initialize 3D Gaussians directly with 3D LiDAR points, which not only underutilizes LiDAR data capabilities but also overlooks the potential advantages of fusing LiDAR with camera data. In this paper, we design a novel tightly coupled LiDAR-Camera Gaussian Splatting (TCLC-GS) to fully leverage the combined strengths of both LiDAR and camera sensors, enabling rapid, high-quality 3D reconstruction and novel view RGB/depth synthesis. TCLC-GS designs a hybrid explicit (colorized 3D mesh) and implicit (hierarchical octree feature) 3D representation derived from LiDAR-camera data, to enrich the properties of 3D Gaussians for splatting. 3D Gaussian's properties are not only initialized in alignment with the 3D mesh which provides more completed 3D shape and color information, but are also endowed with broader contextual information through retrieved octree implicit features. During the Gaussian Splatting optimization process, the 3D mesh offers dense depth information as supervision, which enhances the training process by learning of a robust geometry. Comprehensive evaluations conducted on the Waymo Open Dataset and nuScenes Dataset validate our method's state-of-the-art (SOTA) performance. Utilizing a single NVIDIA RTX 3090 Ti, our method demonstrates fast training and achieves real-time RGB and depth rendering at 90 FPS in resolution of 1920x1280 (Waymo), and 120 FPS in resolution of 1600x900 (nuScenes) in urban scenarios.

TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving

TL;DR

This work tackles fast, high-fidelity 3D reconstruction and novel-view rendering in urban driving by fusing LiDAR and camera data. It introduces TCLC-GS, a tightly coupled LiDAR-Camera Gaussian Splatting framework that uses a hybrid explicit (colorized 3D mesh) and implicit (hierarchical octree features) representation to initialize and optimize 3D Gaussians, with dense depth supervision rendered from the mesh. The method achieves state-of-the-art image and depth synthesis on Waymo and nuScenes while maintaining real-time performance (about 90–120 FPS on a single RTX 3090 Ti). Key contributions include the colorized mesh-guided Gaussian initialization, the octree implicit feature grid for appearance context, and dense mesh-based depth supervision that together enable robust large-scale urban reconstruction suitable for autonomous driving applications.

Abstract

Most 3D Gaussian Splatting (3D-GS) based methods for urban scenes initialize 3D Gaussians directly with 3D LiDAR points, which not only underutilizes LiDAR data capabilities but also overlooks the potential advantages of fusing LiDAR with camera data. In this paper, we design a novel tightly coupled LiDAR-Camera Gaussian Splatting (TCLC-GS) to fully leverage the combined strengths of both LiDAR and camera sensors, enabling rapid, high-quality 3D reconstruction and novel view RGB/depth synthesis. TCLC-GS designs a hybrid explicit (colorized 3D mesh) and implicit (hierarchical octree feature) 3D representation derived from LiDAR-camera data, to enrich the properties of 3D Gaussians for splatting. 3D Gaussian's properties are not only initialized in alignment with the 3D mesh which provides more completed 3D shape and color information, but are also endowed with broader contextual information through retrieved octree implicit features. During the Gaussian Splatting optimization process, the 3D mesh offers dense depth information as supervision, which enhances the training process by learning of a robust geometry. Comprehensive evaluations conducted on the Waymo Open Dataset and nuScenes Dataset validate our method's state-of-the-art (SOTA) performance. Utilizing a single NVIDIA RTX 3090 Ti, our method demonstrates fast training and achieves real-time RGB and depth rendering at 90 FPS in resolution of 1920x1280 (Waymo), and 120 FPS in resolution of 1600x900 (nuScenes) in urban scenarios.
Paper Structure (8 sections, 12 equations, 11 figures, 9 tables)

This paper contains 8 sections, 12 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Left: Original 3D-GS kerbl20233d based methods directly initialize 3D Gaussians by 3D LiDAR points; Right: Our TCLC-GS enriches the geometry and appearance attributes of 3D Gaussians by explicit (colorized 3D mesh) and implicit (hierarchical octree feature) representations.
  • Figure 2: The pipeline of TCLC-GS: We first merge all the LiDAR sweeps together, and then build a hierarchical octree implicit feature grid using the sampled 3D point within the truncation region along the LiDAR rays. These octree implicit features are trained with SDF and RGB decoders, supervised by sparse LiDAR range measurements and surrounding image projected RGB colors. Subsequently, we obtain the optimized octree implicit representations and colorized 3D mesh of the global scene. The geometry attributes of 3D Gaussians are initialized by the 3D mesh while the appearance attributes of 3D Gaussians are enriched by the mesh-vertex-retrieved octree implicit features. The 3D Gaussians are optimized through depth and RGB splatting with dense depth and color supervision. Different from the sparse depth supervision derived from LiDAR, our dense depth supervision is rendered from the 3D mesh utilizing the Ray Tracing method.
  • Figure 3: Visualization of our colorized 3D mesh and dense depths. Row 1: rendered dense surrounding depth images given the camera pose within the 3D mesh; Row 2: generated colorized 3D mesh based on the octree implicit representation.
  • Figure 4: Visual comparison of image and depth synthesis from novel front-left, front, and front-right surrounding views on the Waymo dataset. Row 1: 3D-GS images; Row 2: TCLC-GS images; Row 3: GT images; Row 4: 3D-GS depths; Row 5: TCLC-GS depths; Row 6: GT depth of LiDAR points projected on images.
  • Figure 5: Visual comparison of image synthesis from novel views on nuScenes dataset. Row 1: 3D-GS; Row 2: TCLC-GS; Row 3: GT.
  • ...and 6 more figures