Table of Contents
Fetching ...

Hg-I2P: Bridging Modalities for Generalizable Image-to-Point-Cloud Registration via Heterogeneous Graphs

Pei An, Junfeng Ding, Jiaqi Yang, Yulong Wang, Jie Ma, Liangliang Nan

Abstract

Image-to-point-cloud (I2P) registration aims to align 2D images with 3D point clouds by establishing reliable 2D-3D correspondences. The drastic modality gap between images and point clouds makes it challenging to learn features that are both discriminative and generalizable, leading to severe performance drops in unseen scenarios. We address this challenge by introducing a heterogeneous graph that enables refining both cross-modal features and correspondences within a unified architecture. The proposed graph represents a mapping between segmented 2D and 3D regions, which enhances cross-modal feature interaction and thus improves feature discriminability. In addition, modeling the consistency among vertices and edges within the graph enables pruning of unreliable correspondences. Building on these insights, we propose a heterogeneous graph embedded I2P registration method, termed Hg-I2P. It learns a heterogeneous graph by mining multi-path feature relationships, adapts features under the guidance of heterogeneous edges, and prunes correspondences using graph-based projection consistency. Experiments on six indoor and outdoor benchmarks under cross-domain setups demonstrate that Hg-I2P significantly outperforms existing methods in both generalization and accuracy. Code is released on https://github.com/anpei96/hg-i2p-demo.

Hg-I2P: Bridging Modalities for Generalizable Image-to-Point-Cloud Registration via Heterogeneous Graphs

Abstract

Image-to-point-cloud (I2P) registration aims to align 2D images with 3D point clouds by establishing reliable 2D-3D correspondences. The drastic modality gap between images and point clouds makes it challenging to learn features that are both discriminative and generalizable, leading to severe performance drops in unseen scenarios. We address this challenge by introducing a heterogeneous graph that enables refining both cross-modal features and correspondences within a unified architecture. The proposed graph represents a mapping between segmented 2D and 3D regions, which enhances cross-modal feature interaction and thus improves feature discriminability. In addition, modeling the consistency among vertices and edges within the graph enables pruning of unreliable correspondences. Building on these insights, we propose a heterogeneous graph embedded I2P registration method, termed Hg-I2P. It learns a heterogeneous graph by mining multi-path feature relationships, adapts features under the guidance of heterogeneous edges, and prunes correspondences using graph-based projection consistency. Experiments on six indoor and outdoor benchmarks under cross-domain setups demonstrate that Hg-I2P significantly outperforms existing methods in both generalization and accuracy. Code is released on https://github.com/anpei96/hg-i2p-demo.

Paper Structure

This paper contains 14 sections, 16 equations, 14 figures, 15 tables.

Figures (14)

  • Figure 1: Motivation of Hg-I2P. To achieve generalizable I2P registration, we reformulate the baseline architecture using a heterogeneous graph. Its heterogeneous edges enhance cross-modal feature interaction, improving feature discriminability. Projection constraints within the graph enable consistency-based pruning of mismatched correspondences, enhancing robustness. The resulting framework, Hg-I2P, jointly refines both features and matches, achieving strong generalization across unseen scenes.
  • Figure 2: Constructing a heterogeneous graph ($\mathcal{G}_H$) for cross-modal reasoning. This graph models the relationship between 2D and 3D regions. By explicitly linking visual and geometric entities through heterogeneous edges, it captures structured 2D-3D dependencies essential for joint reasoning.
  • Figure 3: Pipeline of Hg-I2P. MP-mining learns heterogeneous edges $\mathcal{E}_{\text{I2P}}$ and constructs the graph $\mathcal{G}_H$. HE-adapting refines cross-modal features via message passing along heterogeneous edges, improving cross-modal alignment. HC-pruning enforces projection consistency within $\mathcal{G}_H$ to remove outliers. Together, these components refine both features and correspondences for robust I2P registration.
  • Figure 4: Overview of HC-pruning. It filters incorrect correspondences using two projection consistency criteria derived from $\mathcal{G}_H$.
  • Figure 5: Effect of the proposed HE-adapting module. (a) Baseline without HE-adapting. (b) With HE-adapting, cross-modal features are more accurately aligned, producing a higher number of inliers.
  • ...and 9 more figures