Table of Contents
Fetching ...

DGC-GNN: Leveraging Geometry and Color Cues for Visual Descriptor-Free 2D-3D Matching

Shuzhe Wang, Juho Kannala, Daniel Barath

TL;DR

DGC-GNN is introduced, a novel algorithm that employs a global-to-local Graph Neural Network (GNN) that progressively exploits geometric and color cues to rep-resent keypoints, thereby improving matching accuracy and substantially narrowing the performance gap between descriptor-based and descriptor-free methods.

Abstract

Matching 2D keypoints in an image to a sparse 3D point cloud of the scene without requiring visual descriptors has garnered increased interest due to its low memory requirements, inherent privacy preservation, and reduced need for expensive 3D model maintenance compared to visual descriptor-based methods. However, existing algorithms often compromise on performance, resulting in a significant deterioration compared to their descriptor-based counterparts. In this paper, we introduce DGC-GNN, a novel algorithm that employs a global-to-local Graph Neural Network (GNN) that progressively exploits geometric and color cues to represent keypoints, thereby improving matching accuracy. Our procedure encodes both Euclidean and angular relations at a coarse level, forming the geometric embedding to guide the point matching. We evaluate DGC-GNN on both indoor and outdoor datasets, demonstrating that it not only doubles the accuracy of the state-of-the-art visual descriptor-free algorithm but also substantially narrows the performance gap between descriptor-based and descriptor-free methods.

DGC-GNN: Leveraging Geometry and Color Cues for Visual Descriptor-Free 2D-3D Matching

TL;DR

DGC-GNN is introduced, a novel algorithm that employs a global-to-local Graph Neural Network (GNN) that progressively exploits geometric and color cues to rep-resent keypoints, thereby improving matching accuracy and substantially narrowing the performance gap between descriptor-based and descriptor-free methods.

Abstract

Matching 2D keypoints in an image to a sparse 3D point cloud of the scene without requiring visual descriptors has garnered increased interest due to its low memory requirements, inherent privacy preservation, and reduced need for expensive 3D model maintenance compared to visual descriptor-based methods. However, existing algorithms often compromise on performance, resulting in a significant deterioration compared to their descriptor-based counterparts. In this paper, we introduce DGC-GNN, a novel algorithm that employs a global-to-local Graph Neural Network (GNN) that progressively exploits geometric and color cues to represent keypoints, thereby improving matching accuracy. Our procedure encodes both Euclidean and angular relations at a coarse level, forming the geometric embedding to guide the point matching. We evaluate DGC-GNN on both indoor and outdoor datasets, demonstrating that it not only doubles the accuracy of the state-of-the-art visual descriptor-free algorithm but also substantially narrows the performance gap between descriptor-based and descriptor-free methods.
Paper Structure (19 sections, 12 equations, 7 figures, 7 tables)

This paper contains 19 sections, 12 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: 2D-3D matching (shown by green lines) with the proposed DGC-GNN and GoMatch zhou2022geometry. In this example, DGC-GNN obtains 78 correct matches with 0.02 meters camera translation and 0.24$^\circ$ rotation errors, while GoMatch finds only 17 inliers with a pose error of 0.37 meters and 4.37$^\circ$.
  • Figure 2: Pipeline overview. For keypoints from the 2D image and 3D points from the point cloud, the proposed DGC-GNN (1) considers the bearing vectors and the color at each bearing vector as input. (2) It extracts the point-wise position and color features with two separate encoders and mixes the features as $\mathbf{f_p}$ and $\mathbf{f_q}$. (3) The bearing vectors are clustered into $K$ groups, and geometric graphs are built upon the clusters to extract the global-level geometric embeddings $\mathbf{\hat{f}}_{\mathbf{p}}^{gg}$ and $\mathbf{\hat{f}}_{\mathbf{q}}^{gg}$. (4) We then concatenate $\mathbf{\hat{f}}_{\mathbf{p}}^{gg}$ with $\mathbf{f_p}$ and $\mathbf{\hat{f}}_{\mathbf{q}}^{gg}$ with $\mathbf{f_p}$, and build a local graph at each point as self-attention. A cluster-based attention module is adopted to enhance the local features by forcing the message to pass only with the most related features. A differentiable layer matches and optimizes the improved features to obtain score matrix $\mathcal{S}$. Finally, an outlier rejection network is applied to prune the matches with low confidence, leading to the final 2D-3D correspondences $\mathcal{M}_{final}$.
  • Figure 3: Cluster-based geometric encoding. (a) The clusters obtained from bearing vectors $\mathcal{Q}$ of the 3D point cloud are visualized by color. The local graph is created from the neighboring cluster centers. Black 3D points are filtered out from matching. (b) Angular embedding from the global graph to obtain rotation-invariant geometric cues. (c) The clusters obtained from 2D keypoints' bearing vectors $\mathcal{P}$. Similarly, as in 3D, the local graph is created from the neighboring cluster centers.
  • Figure 4: Outlier Sensitivity. The AUC scores of BPnPNet campbell2020solving, GoMatch zhou2022geometry, and the proposed DGC-GNN thresholded at 1, 5, and 10 pixels are plotted as a function of the outlier ratio. Oracle represents the AUC upper bound using ground truth matches.
  • Figure 5: Points Reprojection and Image Recovery Example.
  • ...and 2 more figures