Table of Contents
Fetching ...

One target to align them all: LiDAR, RGB and event cameras extrinsic calibration for Autonomous Driving

Andrea Bertogalli, Giacomo Boracchi, Luca Magri

TL;DR

This paper tackles the challenging problem of extrinsic calibration among LiDAR, RGB, and event cameras for autonomous driving by introducing a one-shot, multi-modal calibration framework. A novel 3D calibration target with frequency-coded LEDs and ArUco-based markers enables simultaneous extraction of cross-modal keypoints, which are fed into Perspective-n-Points to recover all inter-sensor poses via $R_{ij}$ and ${\mathbf{t}}_{ij}$. The approach demonstrates significant gains in event–LiDAR calibration accuracy while maintaining competitive RGB–LiDAR alignment, validated on a custom autonomous driving dataset with a full sensor rig. The method offers a practical, robust path to unified sensor fusion in dynamic driving scenarios and can be extended to additional modalities or stereo calibration.

Abstract

We present a novel multi-modal extrinsic calibration framework designed to simultaneously estimate the relative poses between event cameras, LiDARs, and RGB cameras, with particular focus on the challenging event camera calibration. Core of our approach is a novel 3D calibration target, specifically designed and constructed to be concurrently perceived by all three sensing modalities. The target encodes features in planes, ChArUco, and active LED patterns, each tailored to the unique characteristics of LiDARs, RGB cameras, and event cameras respectively. This unique design enables a one-shot, joint extrinsic calibration process, in contrast to existing approaches that typically rely on separate, pairwise calibrations. Our calibration pipeline is designed to accurately calibrate complex vision systems in the context of autonomous driving, where precise multi-sensor alignment is critical. We validate our approach through an extensive experimental evaluation on a custom built dataset, recorded with an advanced autonomous driving sensor setup, confirming the accuracy and robustness of our method.

One target to align them all: LiDAR, RGB and event cameras extrinsic calibration for Autonomous Driving

TL;DR

This paper tackles the challenging problem of extrinsic calibration among LiDAR, RGB, and event cameras for autonomous driving by introducing a one-shot, multi-modal calibration framework. A novel 3D calibration target with frequency-coded LEDs and ArUco-based markers enables simultaneous extraction of cross-modal keypoints, which are fed into Perspective-n-Points to recover all inter-sensor poses via and . The approach demonstrates significant gains in event–LiDAR calibration accuracy while maintaining competitive RGB–LiDAR alignment, validated on a custom autonomous driving dataset with a full sensor rig. The method offers a practical, robust path to unified sensor fusion in dynamic driving scenarios and can be extended to additional modalities or stereo calibration.

Abstract

We present a novel multi-modal extrinsic calibration framework designed to simultaneously estimate the relative poses between event cameras, LiDARs, and RGB cameras, with particular focus on the challenging event camera calibration. Core of our approach is a novel 3D calibration target, specifically designed and constructed to be concurrently perceived by all three sensing modalities. The target encodes features in planes, ChArUco, and active LED patterns, each tailored to the unique characteristics of LiDARs, RGB cameras, and event cameras respectively. This unique design enables a one-shot, joint extrinsic calibration process, in contrast to existing approaches that typically rely on separate, pairwise calibrations. Our calibration pipeline is designed to accurately calibrate complex vision systems in the context of autonomous driving, where precise multi-sensor alignment is critical. We validate our approach through an extensive experimental evaluation on a custom built dataset, recorded with an advanced autonomous driving sensor setup, confirming the accuracy and robustness of our method.

Paper Structure

This paper contains 18 sections, 3 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: An overview of our calibration target. (a) Our target allows simultaneous features detection across all reference frames. Specifically, the target design addresses event camera detection challenges through frequency-coded LEDs placed at the corners of the cube. (b) The sensor suite along with the estimated relative poses $(R_{ij}, t_{ij})$ between each sensor pair $(i, j)$. Note that not all RGB cameras are visualized for clarity.
  • Figure 2: Our calibration pipeline consists in three feature detection branches on the same calibration target. Our novel event feature detection branch detects the 7 cube's corners $e_{i}$, the RGB feature detection detects the ArUcos' corners $a_{i}$ and the pointcloud feature detection branch detects the 3D points corresponding to cube's corner $E_{i}$ and ArUcos' corners $A_i$. Finally we apply the PnP algorithm on the correspondences $A_i \leftrightarrow a_i$ and $E_i \leftrightarrow e_i$.
  • Figure 3: The operating principle of our calibration target. Each LED on the cube blinks at a specific frequency, resulting in a continuous event generation on the event camera yielding a square wave signal for each pixel. In the event view is shown a cumulated frame-like representation of the event stream over a 33.333 ms time period. The LiDAR view emphasizes the cube inside the pointcloud, where darker colors represent points that are closer to the sensor.
  • Figure 4: On the left: $n$ frequency maps $\mathbf{M}_{i}$ with their bounding boxes $R_i$ are analyzed to yield the best frequency map $\mathbf{M}_{\bar{i}}$ along with the bounding box $R_{\bar{i}}$. On the right, the figure shows some of the detected points (the centers of the fitted ellipses) $e_i$, which are identified using the associated frequency. Note that only three ellipses are shown for simplicity.
  • Figure 5: The output of the three steps to detect the features (the seven cube's corner $E_i$) from the LiDAR pointcloud. (a) the reduced pointcloud we are considering. (b) the fitted planes which represent the cube's faces. (c) the seven detected points $E_i$, which are known exploiting the geometry of the cube.
  • ...and 1 more figures