Table of Contents
Fetching ...

Global Structure-from-Motion Revisited

Linfei Pan, Dániel Baráth, Marc Pollefeys, Johannes L. Schönberger

TL;DR

The paper tackles scalable, accurate 3D reconstruction from images by revisiting global Structure-from-Motion and introducing GLOMAP, a global SfM system that jointly estimates camera and 3D point positions directly from image rays rather than relying on separate translation averaging. Key contributions include a feature-track construction strategy, a unified global positioning objective with a robust, initialization-free formulation, and an accompanying global bundle adjustment, plus camera clustering to handle large internet-scale collections. Empirical results on calibrated and uncalibrated datasets show GLOMAP attaining accuracy comparable to state-of-the-art incremental SfM (e.g., COLMAP) while delivering orders-of-magnitude faster performance, with strong robustness to unknown intrinsics and sequential data. The work also provides extensive ablations, qualitative reconstructions, and a public code release, highlighting practical impact for scalable 3D mapping and novel-view synthesis applications.

Abstract

Recovering 3D structure and camera motion from images has been a long-standing focus of computer vision research and is known as Structure-from-Motion (SfM). Solutions to this problem are categorized into incremental and global approaches. Until now, the most popular systems follow the incremental paradigm due to its superior accuracy and robustness, while global approaches are drastically more scalable and efficient. With this work, we revisit the problem of global SfM and propose GLOMAP as a new general-purpose system that outperforms the state of the art in global SfM. In terms of accuracy and robustness, we achieve results on-par or superior to COLMAP, the most widely used incremental SfM, while being orders of magnitude faster. We share our system as an open-source implementation at {https://github.com/colmap/glomap}.

Global Structure-from-Motion Revisited

TL;DR

The paper tackles scalable, accurate 3D reconstruction from images by revisiting global Structure-from-Motion and introducing GLOMAP, a global SfM system that jointly estimates camera and 3D point positions directly from image rays rather than relying on separate translation averaging. Key contributions include a feature-track construction strategy, a unified global positioning objective with a robust, initialization-free formulation, and an accompanying global bundle adjustment, plus camera clustering to handle large internet-scale collections. Empirical results on calibrated and uncalibrated datasets show GLOMAP attaining accuracy comparable to state-of-the-art incremental SfM (e.g., COLMAP) while delivering orders-of-magnitude faster performance, with strong robustness to unknown intrinsics and sequential data. The work also provides extensive ablations, qualitative reconstructions, and a public code release, highlighting practical impact for scalable 3D mapping and novel-view synthesis applications.

Abstract

Recovering 3D structure and camera motion from images has been a long-standing focus of computer vision research and is known as Structure-from-Motion (SfM). Solutions to this problem are categorized into incremental and global approaches. Until now, the most popular systems follow the incremental paradigm due to its superior accuracy and robustness, while global approaches are drastically more scalable and efficient. With this work, we revisit the problem of global SfM and propose GLOMAP as a new general-purpose system that outperforms the state of the art in global SfM. In terms of accuracy and robustness, we achieve results on-par or superior to COLMAP, the most widely used incremental SfM, while being orders of magnitude faster. We share our system as an open-source implementation at {https://github.com/colmap/glomap}.
Paper Structure (25 sections, 4 equations, 6 figures, 13 tables)

This paper contains 25 sections, 4 equations, 6 figures, 13 tables.

Figures (6)

  • Figure 1: Proposed GLOMAP produces satisfying reconstructions on various datasets. For (b), from left to right are estimated by Theia theia-manual, COLMAP schoenberger2016sfm, GLOMAP. While baseline models fail to produce reliable estimations, GLOMAP achieves high accuracy.
  • Figure 2: Pipeline of proposed GLOMAP system, a global Structure-from-Motion framework, that distinguishes itself from other global methods by merging the translation averaging and triangulation phase into a single global positioning step.
  • Figure 3: Global Positioning. The left figure visualizes the initial configuration, depicting randomly initialized cameras and points. Black arrows, traversing through colored circles on the image planes, denote the measurements. Dashed lines represent the actual image rays, which are subject to optimization by adjusting the positions of the points and the cameras while their orientations remain constant. The right figure displays the outcome following the minimization of angles between the measurements (solid lines) and the image rays from the 3D points (dashed lines).
  • Figure S1: Example reconstructions from the proposed GLOMAP on various datasets.
  • Figure S2: Qualitative results for novel view synthesis with Instant-NGP muller2022instant. The differences on bicycle, bonsai, garden, room and stump are visually evident.
  • ...and 1 more figures