Online,Target-Free LiDAR-Camera Extrinsic Calibration via Cross-Modal Mask Matching

Zhiwei Huang; Yikang Zhang; Qijun Chen; Rui Fan

Online,Target-Free LiDAR-Camera Extrinsic Calibration via Cross-Modal Mask Matching

Zhiwei Huang, Yikang Zhang, Qijun Chen, Rui Fan

TL;DR

The paper addresses robust online, target-free LiDAR-Camera extrinsic calibration by introducing MIAS-LCEC, a coarse-to-fine framework that leverages large vision models for cross-modal mask matching. A virtual camera projection generates LiDAR intensity projections (LIP) and, together with MobileSAM-based segmentation, enables the cross-modal mask matching (C3M) to produce reliable 2D-2D correspondences subsequently refined into 3D-2D PnP estimates of $^{C}_{L}T \,\in\, SE(3)$. The authors present an open-source toolbox and three real-world datasets, and demonstrate that MIAS-LCEC outperforms state-of-the-art online target-free methods and approaches offline target-based calibration in diverse indoor/outdoor scenarios. This work significantly improves cross-modal robustness and adaptability for data fusion in intelligent vehicles and paves the way for real-time, target-free LCEC in dynamic environments.

Abstract

LiDAR-camera extrinsic calibration (LCEC) is crucial for data fusion in intelligent vehicles. Offline, target-based approaches have long been the preferred choice in this field. However, they often demonstrate poor adaptability to real-world environments. This is largely because extrinsic parameters may change significantly due to moderate shocks or during extended operations in environments with vibrations. In contrast, online, target-free approaches provide greater adaptability yet typically lack robustness, primarily due to the challenges in cross-modal feature matching. Therefore, in this article, we unleash the full potential of large vision models (LVMs), which are emerging as a significant trend in the fields of computer vision and robotics, especially for embodied artificial intelligence, to achieve robust and accurate online, target-free LCEC across a variety of challenging scenarios. Our main contributions are threefold: we introduce a novel framework known as MIAS-LCEC, provide an open-source versatile calibration toolbox with an interactive visualization interface, and publish three real-world datasets captured from various indoor and outdoor environments. The cornerstone of our framework and toolbox is the cross-modal mask matching (C3M) algorithm, developed based on a state-of-the-art (SoTA) LVM and capable of generating sufficient and reliable matches. Extensive experiments conducted on these real-world datasets demonstrate the robustness of our approach and its superior performance compared to SoTA methods, particularly for the solid-state LiDARs with super-wide fields of view.

Online,Target-Free LiDAR-Camera Extrinsic Calibration via Cross-Modal Mask Matching

TL;DR

. The authors present an open-source toolbox and three real-world datasets, and demonstrate that MIAS-LCEC outperforms state-of-the-art online target-free methods and approaches offline target-based calibration in diverse indoor/outdoor scenarios. This work significantly improves cross-modal robustness and adaptability for data fusion in intelligent vehicles and paves the way for real-time, target-free LCEC in dynamic environments.

Abstract

Paper Structure (19 sections, 23 equations, 11 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 23 equations, 11 figures, 3 tables, 1 algorithm.

Introduction
Background
Existing Challenges and Motivation
Novel Contributions
Article Structure
Related Work
Target-Based Approaches
Target-Free Approaches
Methodology
Algorithm Overview
Cross-Modal Mask Matching
Experiment
Experimental Setup
Datasets
Evaluation Metrics
...and 4 more sections

Figures (11)

Figure 1: Visualization of the experimental results achieved using our proposed online, target-free LCEC algorithm.
Figure 2: The pipeline of our proposed online, target-free LCEC algorithm.
Figure 3: An example of two-stage, coarse-to-fine cross-modal mask matching result: (a)-(d) illustrate four examples of instance matching and corner point matching results. Potential errors produced by the LVM are greatly minimized through our strict match selection criterion.
Figure 4: Upon matching the target masks $T_v$ and $T_c$, an affine transformation estimated from a pair of reference masks $R_v$ and $R_c$, is used to update the mask in the LIP image, so as to more accurately reflect the actual matching relationship.
Figure 5: Our experimental setup, where two solid-state Livox LiDARs and one MindVision camera are utilized for data acquisition.
...and 6 more figures

Online,Target-Free LiDAR-Camera Extrinsic Calibration via Cross-Modal Mask Matching

TL;DR

Abstract

Online,Target-Free LiDAR-Camera Extrinsic Calibration via Cross-Modal Mask Matching

Authors

TL;DR

Abstract

Table of Contents

Figures (11)