Table of Contents
Fetching ...

Global-Aware Edge Prioritization for Pose Graph Initialization

Tong Wei, Giorgos Tolias, Jiri Matas, Daniel Barath

TL;DR

This globally informed initialization yields more reliable and compact pose graphs, improving reconstruction accuracy in sparse and high-speed settings and outperforming SOTA retrieval methods on ambiguous scenes.

Abstract

The pose graph is a core component of Structure-from-Motion (SfM), where images act as nodes and edges encode relative poses. Since geometric verification is expensive, SfM pipelines restrict the pose graph to a sparse set of candidate edges, making initialization critical. Existing methods rely on image retrieval to connect each image to its $k$ nearest neighbors, treating pairs independently and ignoring global consistency. We address this limitation through the concept of edge prioritization, ranking candidate edges by their utility for SfM. Our approach has three components: (1) a GNN trained with SfM-derived supervision to predict globally consistent edge reliability; (2) multi-minimal-spanning-tree-based pose graph construction guided by these ranks; and (3) connectivity-aware score modulation that reinforces weak regions and reduces graph diameter. This globally informed initialization yields more reliable and compact pose graphs, improving reconstruction accuracy in sparse and high-speed settings and outperforming SOTA retrieval methods on ambiguous scenes. The ode and trained models are available at https://github.com/weitong8591/global_edge_prior.

Global-Aware Edge Prioritization for Pose Graph Initialization

TL;DR

This globally informed initialization yields more reliable and compact pose graphs, improving reconstruction accuracy in sparse and high-speed settings and outperforming SOTA retrieval methods on ambiguous scenes.

Abstract

The pose graph is a core component of Structure-from-Motion (SfM), where images act as nodes and edges encode relative poses. Since geometric verification is expensive, SfM pipelines restrict the pose graph to a sparse set of candidate edges, making initialization critical. Existing methods rely on image retrieval to connect each image to its nearest neighbors, treating pairs independently and ignoring global consistency. We address this limitation through the concept of edge prioritization, ranking candidate edges by their utility for SfM. Our approach has three components: (1) a GNN trained with SfM-derived supervision to predict globally consistent edge reliability; (2) multi-minimal-spanning-tree-based pose graph construction guided by these ranks; and (3) connectivity-aware score modulation that reinforces weak regions and reduces graph diameter. This globally informed initialization yields more reliable and compact pose graphs, improving reconstruction accuracy in sparse and high-speed settings and outperforming SOTA retrieval methods on ambiguous scenes. The ode and trained models are available at https://github.com/weitong8591/global_edge_prior.
Paper Structure (16 sections, 15 equations, 5 figures, 3 tables)

This paper contains 16 sections, 15 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Given a set of input image pairs (left), our method ranks all candidate edges by global matchability (middle) and constructs a compact, well-connected pose graph via multi-MST selection (right). The resulting initialization enables accurate and stable 3D reconstruction, even under sparse or ambiguous settings.
  • Figure 2: Overall pipeline. Input images are first encoded using a fine-tuned image encoder (DINOv2 backbone with SALAD aggregation). A complete graph is then constructed over image embeddings and processed by our GNN–MLP model to predict global edge ranks. During training, these predictions are supervised using geometry-derived ranking signals via a differentiable ranking loss. At inference, the predicted ranks guide the construction of multiple minimum spanning trees, whose union forms the initial pose graph. Incremental SfM is finally applied on this graph to recover the sparse 3D reconstruction.
  • Figure 3: COLMAP reconstruction schonberger2016structure performance using pose graphs constructed from multiple MSTs guided by baseline embedding similarities or our learned global edge ranks. Top row: Relative pose accuracy on IMC23-PhotoTourism imc2023 (AUC@2.5$^\circ$, left; AUC@5$^\circ$, middle) and on MegaDepth li2018megadepth (AUC@5$^\circ$, right) as the number of MSTs $k$ increases. Bottom row: Percentage of registered cameras on PhotoTourism (left), AUC@5$^\circ$ versus COLMAP runtime on PhotoTourism (middle), and percentage of accurately reconstructed cameras on VisymScenes xiangli2025doppelgangers++ (right). Across all benchmarks, our method consistently achieves the highest accuracy, particularly in the sparse regime ($k=1$–$2$), while maintaining competitive or superior reconstruction efficiency.
  • Figure 4: Edge selection strategies for pose graph initialization. We report COLMAP schonberger2016structure relative pose AUC@5$^\circ$ on IMC23-PhotoTourism imc2023 (left) and MegaDepth li2018megadepth (middle) when using either $k$NN or the proposed $k$ minimal spanning tree approach (MSTs) for graph construction. Oracle$^\ast$ curves correspond to ground-truth rankings from SfM-derived geometry. The percentage of successfully registered cameras on PhotoTourism is shown on the right. MST-based selection yields substantially stronger connectivity and higher accuracy than $k$NN, and closely follows the oracle behavior across all $k$s.
  • Figure :