Table of Contents
Fetching ...

GLASS: Geometry-aware Local Alignment and Structure Synchronization Network for 2D-3D Registration

Zhixin Cheng, Jiacheng Deng, Xinjun Li, Bohao Liao, Li Liu, Xiaotian Yin, Baoqun Yin, Tianzhu Zhang

Abstract

Image-to-point cloud registration methods typically follow a coarse-to-fine pipeline, extracting patch-level correspondences and refining them into dense pixel-to-point matches. However, in scenes with repetitive patterns, images often lack sufficient 3D structural cues and alignment with point clouds, leading to incorrect matches. Moreover, prior methods usually overlook structural consistency, limiting the full exploitation of correspondences. To address these issues, we propose two novel modules: the Local Geometry Enhancement (LGE) module and the Graph Distribution Consistency (GDC) module. LGE enhances both image and point cloud features with normal vectors, injecting geometric structure into image features to reduce mismatches. GDC constructs a graph from matched points to update features and explicitly constrain similarity distributions. Extensive experiments and ablations on two benchmarks, RGB-D Scenes v2 and 7-Scenes, demonstrate that our approach achieves state-of-the-art performance in image-to-point cloud registration.

GLASS: Geometry-aware Local Alignment and Structure Synchronization Network for 2D-3D Registration

Abstract

Image-to-point cloud registration methods typically follow a coarse-to-fine pipeline, extracting patch-level correspondences and refining them into dense pixel-to-point matches. However, in scenes with repetitive patterns, images often lack sufficient 3D structural cues and alignment with point clouds, leading to incorrect matches. Moreover, prior methods usually overlook structural consistency, limiting the full exploitation of correspondences. To address these issues, we propose two novel modules: the Local Geometry Enhancement (LGE) module and the Graph Distribution Consistency (GDC) module. LGE enhances both image and point cloud features with normal vectors, injecting geometric structure into image features to reduce mismatches. GDC constructs a graph from matched points to update features and explicitly constrain similarity distributions. Extensive experiments and ablations on two benchmarks, RGB-D Scenes v2 and 7-Scenes, demonstrate that our approach achieves state-of-the-art performance in image-to-point cloud registration.

Paper Structure

This paper contains 15 sections, 21 equations, 6 figures, 10 tables, 2 algorithms.

Figures (6)

  • Figure 1: (a) Visualization of reduced mismatches aided by normal-based structural information. The normals provide shared attributes between the image and the point cloud, enhancing the matching accuracy between them. (b) Visualization of structural similarity distribution constraint. By enforcing the structural consistency constraint, the alignment of image and point cloud keypoints is improved, leading to a better understanding of the scene.
  • Figure 2: Overall pipeline of the GLASS. It includes the Local Geometry Enhancement (LGE) and Graph Distribution Consistency (GDC) modules. In LGE, the image branch predicts surface normals via an Image Normal Prediction Head, trained using pseudo normal labels generated by Depth Anything v2. The point cloud branch computes normals using a Point Normal Estimation Unit. By fusing normals with initial features, the model gains structural awareness and reduces mismatches. In GDC, matching points in the local neighborhoods of both image and point cloud are constructed as graphs to update features. Similarity distributions are then constrained to enforce structural consistency, improving the quality of dense correspondences.
  • Figure 3: Visualization of modality differences. With training, the modalities and distributions of the point cloud and image become more aligned.
  • Figure 4: Visualization of the image-to-point cloud matching results of GLASS. To rigorously analyze the performance, we set the error threshold to a strict 30px. As seen, our method significantly improves the matching performance in challenging scenarios.
  • Figure 5: The visualization of the point cloud projection onto the image shows that our method achieves an accurate rigid transformation, without causing significant misalignment.
  • ...and 1 more figures