Table of Contents
Fetching ...

3DGS-Calib: 3D Gaussian Splatting for Multimodal SpatioTemporal Calibration

Quentin Herau, Moussab Bennehar, Arthur Moreau, Nathan Piasco, Luis Roldao, Dzmitry Tsishkou, Cyrille Migniot, Pascal Vasseur, Cédric Demonceaux

TL;DR

The paper tackles fast, accurate multimodal spatiotemporal calibration for LiDAR and cameras by replacing slow NeRF-based representations with 3D Gaussian Splatting. It introduces 3DGS-Calib, which uses LiDAR-derived Gaussians as a fixed geometric scaffold and a shared MLP to predict per-Gaussian parameters, optimized via a differentiable photometric loss that aligns multi-sensor data. Through preprocessing tricks like accumulated LiDAR downsampling, progressive voxelization, image cropping, and scale regularization, it achieves substantial speed-ups while maintaining or improving accuracy, as demonstrated on KITTI-360 sequences. The approach outperforms NeRF-based methods and classical baselines in both spatiotemporal and LiDAR-camera calibration tasks, highlighting its practical potential for online, in-the-wild sensor fusion.

Abstract

Reliable multimodal sensor fusion algorithms require accurate spatiotemporal calibration. Recently, targetless calibration techniques based on implicit neural representations have proven to provide precise and robust results. Nevertheless, such methods are inherently slow to train given the high computational overhead caused by the large number of sampled points required for volume rendering. With the recent introduction of 3D Gaussian Splatting as a faster alternative to implicit representation methods, we propose to leverage this new rendering approach to achieve faster multi-sensor calibration. We introduce 3DGS-Calib, a new calibration method that relies on the speed and rendering accuracy of 3D Gaussian Splatting to achieve multimodal spatiotemporal calibration that is accurate, robust, and with a substantial speed-up compared to methods relying on implicit neural representations. We demonstrate the superiority of our proposal with experimental results on sequences from KITTI-360, a widely used driving dataset.

3DGS-Calib: 3D Gaussian Splatting for Multimodal SpatioTemporal Calibration

TL;DR

The paper tackles fast, accurate multimodal spatiotemporal calibration for LiDAR and cameras by replacing slow NeRF-based representations with 3D Gaussian Splatting. It introduces 3DGS-Calib, which uses LiDAR-derived Gaussians as a fixed geometric scaffold and a shared MLP to predict per-Gaussian parameters, optimized via a differentiable photometric loss that aligns multi-sensor data. Through preprocessing tricks like accumulated LiDAR downsampling, progressive voxelization, image cropping, and scale regularization, it achieves substantial speed-ups while maintaining or improving accuracy, as demonstrated on KITTI-360 sequences. The approach outperforms NeRF-based methods and classical baselines in both spatiotemporal and LiDAR-camera calibration tasks, highlighting its practical potential for online, in-the-wild sensor fusion.

Abstract

Reliable multimodal sensor fusion algorithms require accurate spatiotemporal calibration. Recently, targetless calibration techniques based on implicit neural representations have proven to provide precise and robust results. Nevertheless, such methods are inherently slow to train given the high computational overhead caused by the large number of sampled points required for volume rendering. With the recent introduction of 3D Gaussian Splatting as a faster alternative to implicit representation methods, we propose to leverage this new rendering approach to achieve faster multi-sensor calibration. We introduce 3DGS-Calib, a new calibration method that relies on the speed and rendering accuracy of 3D Gaussian Splatting to achieve multimodal spatiotemporal calibration that is accurate, robust, and with a substantial speed-up compared to methods relying on implicit neural representations. We demonstrate the superiority of our proposal with experimental results on sequences from KITTI-360, a widely used driving dataset.
Paper Structure (18 sections, 8 equations, 5 figures, 6 tables)

This paper contains 18 sections, 8 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Pipeline of 3DGS-Calib: The Gaussians' positions are given as input to the neural network which predicts their parameters. In parallel, the calibration parameters provide the input pose that transforms the Gaussians from the world frame to the image frame. Then, the 3D Gaussians are splatted using their predicted parameters to generate the rendered image. This image is compared to its ground-truth (GT) counterpart to compute the photometric loss. Finally, the gradients are backpropagated to the neural network and the calibration parameters.
  • Figure 2: Results for MOISST, MOISST /w cropping and 3DGS-Calib as box plots. The red lines show the initial error (Best viewed in color).
  • Figure 3: Rendering results with different voxel sizes: The finer details of the scene require a smaller voxel size to be learned.
  • Figure 4: LiDAR/Camera calibration results: Point cloud to camera reprojection results obtained from the compared methods.
  • Figure 5: MOISST rendering with LiDAR/Camera calibration: MOISST fails to merge the geometry of the trees and cars from the LiDAR with the geometry built with the images.