Table of Contents
Fetching ...

A2-GNN: Angle-Annular GNN for Visual Descriptor-free Camera Relocalization

Yejun Zhang, Shuzhe Wang, Juho Kannala

TL;DR

A2-GNN introduces an Angle-Annular Graph Neural Network for visual descriptor-free camera relocalization, leveraging bearing vectors and a local graph with angle-based neighbor grouping to capture robust geometric structure. The architecture combines a feature encoder, angle-annular geometric processing, optimal transport for initialization, and bearing-vector-based outlier rejection, trained with a joint matching and outlier loss. Empirical results on MegaDepth, Cambridge Landmark, and 7Scenes show state-of-the-art performance among descriptor-free methods with substantial efficiency gains over prior descriptor-free approaches. The work highlights the feasibility and practical impact of descriptor-free 2D–3D matching for robust, privacy-preserving localization, while acknowledging remaining gaps to descriptor-based methods and sensitivity to high outlier ratios.

Abstract

Visual localization involves estimating the 6-degree-of-freedom (6-DoF) camera pose within a known scene. A critical step in this process is identifying pixel-to-point correspondences between 2D query images and 3D models. Most advanced approaches currently rely on extensive visual descriptors to establish these correspondences, facing challenges in storage, privacy issues and model maintenance. Direct 2D-3D keypoint matching without visual descriptors is becoming popular as it can overcome those challenges. However, existing descriptor-free methods suffer from low accuracy or heavy computation. Addressing this gap, this paper introduces the Angle-Annular Graph Neural Network (A2-GNN), a simple approach that efficiently learns robust geometric structural representations with annular feature extraction. Specifically, this approach clusters neighbors and embeds each group's distance information and angle as supplementary information to capture local structures. Evaluation on matching and visual localization datasets demonstrates that our approach achieves state-of-the-art accuracy with low computational overhead among visual description-free methods. Our code will be released on https://github.com/YejunZhang/a2-gnn.

A2-GNN: Angle-Annular GNN for Visual Descriptor-free Camera Relocalization

TL;DR

A2-GNN introduces an Angle-Annular Graph Neural Network for visual descriptor-free camera relocalization, leveraging bearing vectors and a local graph with angle-based neighbor grouping to capture robust geometric structure. The architecture combines a feature encoder, angle-annular geometric processing, optimal transport for initialization, and bearing-vector-based outlier rejection, trained with a joint matching and outlier loss. Empirical results on MegaDepth, Cambridge Landmark, and 7Scenes show state-of-the-art performance among descriptor-free methods with substantial efficiency gains over prior descriptor-free approaches. The work highlights the feasibility and practical impact of descriptor-free 2D–3D matching for robust, privacy-preserving localization, while acknowledging remaining gaps to descriptor-based methods and sensitivity to high outlier ratios.

Abstract

Visual localization involves estimating the 6-degree-of-freedom (6-DoF) camera pose within a known scene. A critical step in this process is identifying pixel-to-point correspondences between 2D query images and 3D models. Most advanced approaches currently rely on extensive visual descriptors to establish these correspondences, facing challenges in storage, privacy issues and model maintenance. Direct 2D-3D keypoint matching without visual descriptors is becoming popular as it can overcome those challenges. However, existing descriptor-free methods suffer from low accuracy or heavy computation. Addressing this gap, this paper introduces the Angle-Annular Graph Neural Network (A2-GNN), a simple approach that efficiently learns robust geometric structural representations with annular feature extraction. Specifically, this approach clusters neighbors and embeds each group's distance information and angle as supplementary information to capture local structures. Evaluation on matching and visual localization datasets demonstrates that our approach achieves state-of-the-art accuracy with low computational overhead among visual description-free methods. Our code will be released on https://github.com/YejunZhang/a2-gnn.

Paper Structure

This paper contains 15 sections, 12 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Matching Accuracy and Efficiency Comparisons for descriptor-free methods. Compared with GoMatch zhou2022geometry and DGC-GNN wang2024dgc, our A2-GNN learns effective and accurate 2D-3D matching.
  • Figure 2: Architecture Overview. The bearing vector (BV) and RGB information from the query image and 3D point cloud are first processed through an encoder to generate high-dimensional features. These features are then used to construct the graph nodes. In the self-attention layer, the angle-annular convolution is employed to extract discriminative geometric information from the neighboring points. After the GNNs, these enhanced features are used to establish initial correspondences via optimal transport. Outlier rejection is then applied to eliminate erroneous correspondences, resulting in a final set of accurate correspondences.
  • Figure 3: Illustration of angle embedding. The angle embedding between node and its neighbor is added to enhance feature representation.
  • Figure 4: Outlier Sensitivity. Comparison of the AUC for GoMatch zhou2022geometry, DGC-GNN wang2024dgc, and the proposed A2-GNN under different outlier ratios at 1, 5, and 10 pixels thresholds. Oracle is the upper bound by using ground truth matches.