Table of Contents
Fetching ...

Online Targetless Radar-Camera Extrinsic Calibration Based on the Common Features of Radar and Camera

Lei Cheng, Siyang Cao

TL;DR

This work tackles the challenge of extrinsic calibration between radar and camera without calibration targets, enabling online recalibration in dynamic environments. It introduces a deep-learning-based framework that learns common features from raw radar data ($P_r$ in the radar coordinate system) and camera images, using a YOLO-based detector, a radar feature extractor, and a common feature discriminator to establish object correspondences. The extrinsic matrix $Q=[R|T]$ is then estimated by solving a PnP-like reprojection error with robust initialization via $RANSAC$ and refinement via Levenberg–Marquardt ($LM$), i.e., minimizing $Err_{rep} = frac{1}{N} egin{bmatrix} orm{P^{i}_{p\_gt} - s^{-1} K Q P^{i}_r}_{2}^{2} ight]^{1/2}$. Real-world experiments yield 24 correspondences with 21 inliers, achieving mean absolute reprojection error $MARE=59.89$ px and RMS reprojection error $RMSRE=98.48$ px, demonstrating accurate alignment of radar and image data without calibration targets and enabling online recalibration for robust sensor fusion.

Abstract

Sensor fusion is essential for autonomous driving and autonomous robots, and radar-camera fusion systems have gained popularity due to their complementary sensing capabilities. However, accurate calibration between these two sensors is crucial to ensure effective fusion and improve overall system performance. Calibration involves intrinsic and extrinsic calibration, with the latter being particularly important for achieving accurate sensor fusion. Unfortunately, many target-based calibration methods require complex operating procedures and well-designed experimental conditions, posing challenges for researchers attempting to reproduce the results. To address this issue, we introduce a novel approach that leverages deep learning to extract a common feature from raw radar data (i.e., Range-Doppler-Angle data) and camera images. Instead of explicitly representing these common features, our method implicitly utilizes these common features to match identical objects from both data sources. Specifically, the extracted common feature serves as an example to demonstrate an online targetless calibration method between the radar and camera systems. The estimation of the extrinsic transformation matrix is achieved through this feature-based approach. To enhance the accuracy and robustness of the calibration, we apply the RANSAC and Levenberg-Marquardt (LM) nonlinear optimization algorithm for deriving the matrix. Our experiments in the real world demonstrate the effectiveness and accuracy of our proposed method.

Online Targetless Radar-Camera Extrinsic Calibration Based on the Common Features of Radar and Camera

TL;DR

This work tackles the challenge of extrinsic calibration between radar and camera without calibration targets, enabling online recalibration in dynamic environments. It introduces a deep-learning-based framework that learns common features from raw radar data ( in the radar coordinate system) and camera images, using a YOLO-based detector, a radar feature extractor, and a common feature discriminator to establish object correspondences. The extrinsic matrix is then estimated by solving a PnP-like reprojection error with robust initialization via and refinement via Levenberg–Marquardt (), i.e., minimizing . Real-world experiments yield 24 correspondences with 21 inliers, achieving mean absolute reprojection error px and RMS reprojection error px, demonstrating accurate alignment of radar and image data without calibration targets and enabling online recalibration for robust sensor fusion.

Abstract

Sensor fusion is essential for autonomous driving and autonomous robots, and radar-camera fusion systems have gained popularity due to their complementary sensing capabilities. However, accurate calibration between these two sensors is crucial to ensure effective fusion and improve overall system performance. Calibration involves intrinsic and extrinsic calibration, with the latter being particularly important for achieving accurate sensor fusion. Unfortunately, many target-based calibration methods require complex operating procedures and well-designed experimental conditions, posing challenges for researchers attempting to reproduce the results. To address this issue, we introduce a novel approach that leverages deep learning to extract a common feature from raw radar data (i.e., Range-Doppler-Angle data) and camera images. Instead of explicitly representing these common features, our method implicitly utilizes these common features to match identical objects from both data sources. Specifically, the extracted common feature serves as an example to demonstrate an online targetless calibration method between the radar and camera systems. The estimation of the extrinsic transformation matrix is achieved through this feature-based approach. To enhance the accuracy and robustness of the calibration, we apply the RANSAC and Levenberg-Marquardt (LM) nonlinear optimization algorithm for deriving the matrix. Our experiments in the real world demonstrate the effectiveness and accuracy of our proposed method.
Paper Structure (12 sections, 2 equations, 5 figures, 1 table)

This paper contains 12 sections, 2 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Framework for the proposed radar-camera online extrinsic co-calibration method. The pipeline showcases the sequential steps involved in calibrating the radar and camera sensors. The method first trains a deep-learning common feature discriminator to determine whether the detected objects in the radar and camera data share common features. Subsequently, the trained common feature discriminator is utilized to find matching objects in both radar and camera views based on the existence of common features. Finally, based on these matching objects, corresponding camera-radar point pairs are formed for calibration.
  • Figure 2: Architecture of the YOLO-based Common Feature Network. CSPResNet: Cross-Stage Partial ResNet. CBL: Convolution3D + Batch Normalization + LeakyReLU. SPP: spatial pyramid pooling. PANet: Path Aggregation Network bochkovskiy2020yolov4.
  • Figure 3: (a) The $24$ image-radar point correspondences obtained through a block-based sampling strategy for calibration. (b) The correspondence between the projected radar points (i.e., the radar points from (a) projected onto the image using the calibration matrix) and the image points, as well as the inliers used for calibration.
  • Figure 4: Projecting radar points onto the image using the obtained calibration matrix. (a) Projection of individual radar points onto the corresponding targets in the image, namely the car and the person. (b) Trajectories of the image points and the projected radar points corresponding to the movement of the two targets at frame 39. (c) Trajectories of the image points and the projected radar points corresponding to the movement of the two targets at frame 204.
  • Figure 5: The experimental setup, where a laptop and an NVidia Jetson Xavier GPU establish a connection with the radar-camera system using Ethernet and USB cables to collect data.