Table of Contents
Fetching ...

RC-AutoCalib: An End-to-End Radar-Camera Automatic Calibration Network

Van-Tin Luu, Yon-Lin Cai, Vu-Hoang Tran, Wei-Chen Chiu, Yi-Ting Chen, Ching-Chun Huang

TL;DR

RC-AutoCalib addresses online, targetless radar–camera calibration by introducing a Dual-Perspective representation that mitigates elevation ambiguity and radar sparsity. It combines a Selective Fusion mechanism with a Multi-Modal Cross-Attention module and a Noise-Resistant Matcher to learn robust radar–image correspondences, supervised explicitly by matching signals and calibrated by a 6-DoF regressor across iterations. The approach achieves state-of-the-art calibration accuracy on nuScenes, significantly outperforming prior radar–camera and several LiDAR–camera methods, and demonstrates good generalization in cross-dataset tests. The work also extends to LiDAR–camera calibration and shows negligible impact on downstream 3D perception tasks, underscoring practical applicability in ADAS and autonomous systems.

Abstract

This paper presents a groundbreaking approach - the first online automatic geometric calibration method for radar and camera systems. Given the significant data sparsity and measurement uncertainty in radar height data, achieving automatic calibration during system operation has long been a challenge. To address the sparsity issue, we propose a Dual-Perspective representation that gathers features from both frontal and bird's-eye views. The frontal view contains rich but sensitive height information, whereas the bird's-eye view provides robust features against height uncertainty. We thereby propose a novel Selective Fusion Mechanism to identify and fuse reliable features from both perspectives, reducing the effect of height uncertainty. Moreover, for each view, we incorporate a Multi-Modal Cross-Attention Mechanism to explicitly find location correspondences through cross-modal matching. During the training phase, we also design a Noise-Resistant Matcher to provide better supervision and enhance the robustness of the matching mechanism against sparsity and height uncertainty. Our experimental results, tested on the nuScenes dataset, demonstrate that our method significantly outperforms previous radar-camera auto-calibration methods, as well as existing state-of-the-art LiDAR-camera calibration techniques, establishing a new benchmark for future research. The code is available at https://github.com/nycu-acm/RC-AutoCalib.

RC-AutoCalib: An End-to-End Radar-Camera Automatic Calibration Network

TL;DR

RC-AutoCalib addresses online, targetless radar–camera calibration by introducing a Dual-Perspective representation that mitigates elevation ambiguity and radar sparsity. It combines a Selective Fusion mechanism with a Multi-Modal Cross-Attention module and a Noise-Resistant Matcher to learn robust radar–image correspondences, supervised explicitly by matching signals and calibrated by a 6-DoF regressor across iterations. The approach achieves state-of-the-art calibration accuracy on nuScenes, significantly outperforming prior radar–camera and several LiDAR–camera methods, and demonstrates good generalization in cross-dataset tests. The work also extends to LiDAR–camera calibration and shows negligible impact on downstream 3D perception tasks, underscoring practical applicability in ADAS and autonomous systems.

Abstract

This paper presents a groundbreaking approach - the first online automatic geometric calibration method for radar and camera systems. Given the significant data sparsity and measurement uncertainty in radar height data, achieving automatic calibration during system operation has long been a challenge. To address the sparsity issue, we propose a Dual-Perspective representation that gathers features from both frontal and bird's-eye views. The frontal view contains rich but sensitive height information, whereas the bird's-eye view provides robust features against height uncertainty. We thereby propose a novel Selective Fusion Mechanism to identify and fuse reliable features from both perspectives, reducing the effect of height uncertainty. Moreover, for each view, we incorporate a Multi-Modal Cross-Attention Mechanism to explicitly find location correspondences through cross-modal matching. During the training phase, we also design a Noise-Resistant Matcher to provide better supervision and enhance the robustness of the matching mechanism against sparsity and height uncertainty. Our experimental results, tested on the nuScenes dataset, demonstrate that our method significantly outperforms previous radar-camera auto-calibration methods, as well as existing state-of-the-art LiDAR-camera calibration techniques, establishing a new benchmark for future research. The code is available at https://github.com/nycu-acm/RC-AutoCalib.

Paper Structure

This paper contains 27 sections, 23 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: An overview of the proposed RC-AutoCalib method. The approach takes input from radar-camera miscalibration, representing it as feature pairs in Dual-perspective view. These feature representations are then enhanced through feature matching block, from which reliable features are selected to predict the rotation vector and translation.
  • Figure 1: Illustration of bounding box $B$. Suppose we consider only the y and z axes to calculate $w_B$ based on $\delta$
  • Figure 2: Challenges of 3D Millimeter-Wave Radar. (a) The green dashed line represents the height plane the radar focuses on. Points $\mathbf{A}$, $\mathbf{B}$, and $\mathbf{C}$ denote actual reflection positions, whereas $\mathbf{A}_{\text{radar}}$, $\mathbf{B}_{\text{radar}}$, and $\mathbf{C}_{\text{radar}}$ are the positions recorded by the radar. $\mathbf{D}_{\text{A}}$, $\mathbf{D}_{\text{B}}$, and $\mathbf{D}_{\text{C}}$ represent the recorded and noisy radar depths. (b) The top image shows a "LiDAR" depth map projected onto the camera plane, while the bottom image displays a "radar" depth map projected similarly. The red box highlights the issue where depths of points on the same object should be similar, yet significant variations are evident, indicating the presence of noise. Moreover, the green box shows a structural comparison: the "LiDAR" point cloud distinctly outlines the object’s contour, while the "radar" point cloud fails to convey structural information.
  • Figure 2: Calibration results by projecting radar points onto the FV image
  • Figure 3: Our system flow for iterative online auto-calibration starts with the input image, point cloud, and initial calibration parameters $T_{init}$, which first pass through the Data Transform module (\ref{['sec:data transform']}). Here, we obtain the estimated image depth map and miscalibrated radar depth map from the frontal view (FV) perspective, along with the pseudo-BEV image and miscalibrated BEV radar projection. These outputs are then processed in the Feature Extraction module (\ref{['sec:feature extraction']}), where features from both FV and BEV perspectives undergo Feature Matching (\ref{['sec:feature matching']}) between the image and radar data. Subsequently, after Feature Matching and Fusion (\ref{['sec:selective fusion']}), the Regression Head (\ref{['sec:regression head']}) generates the rotation and translation vectors that form the transformation matrix, $\hat{T}_{pred}^{i}$, to refine calibration. Finally, $\hat{T}_{pred}^{i}$ is fed back to $T_{init}$ to update the calibration parameters for the next $i$-th iteration.
  • ...and 4 more figures