Table of Contents
Fetching ...

SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration using Neural Radiance Fields

Quentin Herau, Nathan Piasco, Moussab Bennehar, Luis Roldão, Dzmitry Tsishkou, Cyrille Migniot, Pascal Vasseur, Cédric Demonceaux

TL;DR

This paper uses the ability of Neural Radiance Fields (NeRF) to represent different sensors modalities in a common volumetric representation to achieve robust and accurate spatio-temporal sensor calibration, and designs a partitioning approach based on the visible part of the scene for each sensor.

Abstract

In rapidly-evolving domains such as autonomous driving, the use of multiple sensors with different modalities is crucial to ensure high operational precision and stability. To correctly exploit the provided information by each sensor in a single common frame, it is essential for these sensors to be accurately calibrated. In this paper, we leverage the ability of Neural Radiance Fields (NeRF) to represent different sensors modalities in a common volumetric representation to achieve robust and accurate spatio-temporal sensor calibration. By designing a partitioning approach based on the visible part of the scene for each sensor, we formulate the calibration problem using only the overlapping areas. This strategy results in a more robust and accurate calibration that is less prone to failure. We demonstrate that our approach works on outdoor urban scenes by validating it on multiple established driving datasets. Results show that our method is able to get better accuracy and robustness compared to existing methods.

SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration using Neural Radiance Fields

TL;DR

This paper uses the ability of Neural Radiance Fields (NeRF) to represent different sensors modalities in a common volumetric representation to achieve robust and accurate spatio-temporal sensor calibration, and designs a partitioning approach based on the visible part of the scene for each sensor.

Abstract

In rapidly-evolving domains such as autonomous driving, the use of multiple sensors with different modalities is crucial to ensure high operational precision and stability. To correctly exploit the provided information by each sensor in a single common frame, it is essential for these sensors to be accurately calibrated. In this paper, we leverage the ability of Neural Radiance Fields (NeRF) to represent different sensors modalities in a common volumetric representation to achieve robust and accurate spatio-temporal sensor calibration. By designing a partitioning approach based on the visible part of the scene for each sensor, we formulate the calibration problem using only the overlapping areas. This strategy results in a more robust and accurate calibration that is less prone to failure. We demonstrate that our approach works on outdoor urban scenes by validating it on multiple established driving datasets. Results show that our method is able to get better accuracy and robustness compared to existing methods.
Paper Structure (40 sections, 8 equations, 17 figures, 8 tables)

This paper contains 40 sections, 8 equations, 17 figures, 8 tables.

Figures (17)

  • Figure 1: Method overview. SOAC is a novel multimodal spatio-temporal calibration method for cameras and LiDAR in the context of autonomous driving. By alternating the training of multiple implicit scenes (Sec. \ref{['sec/method/step1']}) and sensors co-registration from these representations (Sec. \ref{['sec/method/step2']}), SOAC achieves precise self-supervised calibration from raw data acquired in unconstrained urban environments.
  • Figure 2: SOAC training strategy.\ref{['fig:method/step1']} Scene representation training (Sec. \ref{['sec/method/step1']}): The parameters $\textcolor{red}{\hat{\Theta}}$ of each NeRF are trained with the images from their associated cameras and the LiDAR scans. The LiDAR calibration is also optimized through $\textcolor{red}{T^{n_{(i+2)}}}$. \ref{['fig:method/step2']} Extrinsic and temporal optimization (Sec. \ref{['sec/method/step2']}): The real frame from the sensor is compared to the predicted frame on the other NeRFs to calculate the losses. The calibration is then optimized with backpropagation through the poses $\textcolor{red}{T^{n_{(i+1)}}}$ and $\textcolor{red}{T^{n_{(i+2)}}}$.
  • Figure 3: SOAC's visibility grid (Sec. \ref{['sec/method/filtergrid']}). \ref{['fig:filter_grid/filling']} Grid filling: Rays from camera $C_i$ fill the visibility grid linked to Nerf $\Theta_i$. \ref{['fig:filter_grid/filtering']} Ray filtering: For cameras $\textcolor{orange}{C}_{\textcolor{orange}{j, \forall j} \neq \textcolor{blue}{i}}$, rays are kept or filtered according to visibility from \ref{['fig:filter_grid/filling']}.
  • Figure 4: Visualization of the visibility grid (Sec. \ref{['sec/method/filtergrid']}). Predictions done with the NeRF trained by front camera on a Pandaset xiao2021pandaset sequence.
  • Figure 5: Results for SOAC and MOISSTherau2023moisst as box plots with log scale on KITTI-360 liao2022kitti and Nuscenes caesar2020nuscenes sequences. The red lines show the initial error (best viewed in color).
  • ...and 12 more figures