Table of Contents
Fetching ...

Online,Target-Free LiDAR-Camera Extrinsic Calibration via Cross-Modal Mask Matching

Zhiwei Huang, Yikang Zhang, Qijun Chen, Rui Fan

TL;DR

The paper addresses robust online, target-free LiDAR-Camera extrinsic calibration by introducing MIAS-LCEC, a coarse-to-fine framework that leverages large vision models for cross-modal mask matching. A virtual camera projection generates LiDAR intensity projections (LIP) and, together with MobileSAM-based segmentation, enables the cross-modal mask matching (C3M) to produce reliable 2D-2D correspondences subsequently refined into 3D-2D PnP estimates of $^{C}_{L}T \,\in\, SE(3)$. The authors present an open-source toolbox and three real-world datasets, and demonstrate that MIAS-LCEC outperforms state-of-the-art online target-free methods and approaches offline target-based calibration in diverse indoor/outdoor scenarios. This work significantly improves cross-modal robustness and adaptability for data fusion in intelligent vehicles and paves the way for real-time, target-free LCEC in dynamic environments.

Abstract

LiDAR-camera extrinsic calibration (LCEC) is crucial for data fusion in intelligent vehicles. Offline, target-based approaches have long been the preferred choice in this field. However, they often demonstrate poor adaptability to real-world environments. This is largely because extrinsic parameters may change significantly due to moderate shocks or during extended operations in environments with vibrations. In contrast, online, target-free approaches provide greater adaptability yet typically lack robustness, primarily due to the challenges in cross-modal feature matching. Therefore, in this article, we unleash the full potential of large vision models (LVMs), which are emerging as a significant trend in the fields of computer vision and robotics, especially for embodied artificial intelligence, to achieve robust and accurate online, target-free LCEC across a variety of challenging scenarios. Our main contributions are threefold: we introduce a novel framework known as MIAS-LCEC, provide an open-source versatile calibration toolbox with an interactive visualization interface, and publish three real-world datasets captured from various indoor and outdoor environments. The cornerstone of our framework and toolbox is the cross-modal mask matching (C3M) algorithm, developed based on a state-of-the-art (SoTA) LVM and capable of generating sufficient and reliable matches. Extensive experiments conducted on these real-world datasets demonstrate the robustness of our approach and its superior performance compared to SoTA methods, particularly for the solid-state LiDARs with super-wide fields of view.

Online,Target-Free LiDAR-Camera Extrinsic Calibration via Cross-Modal Mask Matching

TL;DR

The paper addresses robust online, target-free LiDAR-Camera extrinsic calibration by introducing MIAS-LCEC, a coarse-to-fine framework that leverages large vision models for cross-modal mask matching. A virtual camera projection generates LiDAR intensity projections (LIP) and, together with MobileSAM-based segmentation, enables the cross-modal mask matching (C3M) to produce reliable 2D-2D correspondences subsequently refined into 3D-2D PnP estimates of . The authors present an open-source toolbox and three real-world datasets, and demonstrate that MIAS-LCEC outperforms state-of-the-art online target-free methods and approaches offline target-based calibration in diverse indoor/outdoor scenarios. This work significantly improves cross-modal robustness and adaptability for data fusion in intelligent vehicles and paves the way for real-time, target-free LCEC in dynamic environments.

Abstract

LiDAR-camera extrinsic calibration (LCEC) is crucial for data fusion in intelligent vehicles. Offline, target-based approaches have long been the preferred choice in this field. However, they often demonstrate poor adaptability to real-world environments. This is largely because extrinsic parameters may change significantly due to moderate shocks or during extended operations in environments with vibrations. In contrast, online, target-free approaches provide greater adaptability yet typically lack robustness, primarily due to the challenges in cross-modal feature matching. Therefore, in this article, we unleash the full potential of large vision models (LVMs), which are emerging as a significant trend in the fields of computer vision and robotics, especially for embodied artificial intelligence, to achieve robust and accurate online, target-free LCEC across a variety of challenging scenarios. Our main contributions are threefold: we introduce a novel framework known as MIAS-LCEC, provide an open-source versatile calibration toolbox with an interactive visualization interface, and publish three real-world datasets captured from various indoor and outdoor environments. The cornerstone of our framework and toolbox is the cross-modal mask matching (C3M) algorithm, developed based on a state-of-the-art (SoTA) LVM and capable of generating sufficient and reliable matches. Extensive experiments conducted on these real-world datasets demonstrate the robustness of our approach and its superior performance compared to SoTA methods, particularly for the solid-state LiDARs with super-wide fields of view.
Paper Structure (19 sections, 23 equations, 11 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 23 equations, 11 figures, 3 tables, 1 algorithm.

Figures (11)

  • Figure 1: Visualization of the experimental results achieved using our proposed online, target-free LCEC algorithm.
  • Figure 2: The pipeline of our proposed online, target-free LCEC algorithm.
  • Figure 3: An example of two-stage, coarse-to-fine cross-modal mask matching result: (a)-(d) illustrate four examples of instance matching and corner point matching results. Potential errors produced by the LVM are greatly minimized through our strict match selection criterion.
  • Figure 4: Upon matching the target masks $T_v$ and $T_c$, an affine transformation estimated from a pair of reference masks $R_v$ and $R_c$, is used to update the mask in the LIP image, so as to more accurately reflect the actual matching relationship.
  • Figure 5: Our experimental setup, where two solid-state Livox LiDARs and one MindVision camera are utilized for data acquisition.
  • ...and 6 more figures