Table of Contents
Fetching ...

Loop Closure from Two Views: Revisiting PGO for Scalable Trajectory Estimation through Monocular Priors

Tian Yi Lim, Boyang Sun, Marc Pollefeys, Hermann Blum

TL;DR

We address scalability in Visual SLAM by removing dense 3D reconstruction and leveraging two-view loop closures to drive pose-graph optimization. The proposed 2GO framework constructs scale-free and absolute LC-Edges between sparse keyframes using state-of-the-art image matching and monocular priors, and optimizes them with a GTSAM-based PGO backend. Across KITTI, VBR, and a long real-world robot sequence, 2GO delivers real-time performance with significantly smaller maps and often superior trajectory accuracy compared with BA-based baselines. This map-free approach broadens the applicability of VSLAM to long-duration, large-scale deployments where dense reconstruction is unnecessary, while maintaining competitive precision.

Abstract

(Visual) Simultaneous Localization and Mapping (SLAM) remains a fundamental challenge in enabling autonomous systems to navigate and understand large-scale environments. Traditional SLAM approaches struggle to balance efficiency and accuracy, particularly in large-scale settings where extensive computational resources are required for scene reconstruction and Bundle Adjustment (BA). However, this scene reconstruction, in the form of sparse pointclouds of visual landmarks, is often only used within the SLAM system because navigation and planning methods require different map representations. In this work, we therefore investigate a more scalable Visual SLAM (VSLAM) approach without reconstruction, mainly based on approaches for two-view loop closures. By restricting the map to a sparse keyframed pose graph without dense geometry representations, our `2GO' system achieves efficient optimization with competitive absolute trajectory accuracy. In particular, we find that recent advancements in image matching and monocular depth priors enable very accurate trajectory optimization without BA. We conduct extensive experiments on diverse datasets, including large-scale scenarios, and provide a detailed analysis of the trade-offs between runtime, accuracy, and map size. Our results demonstrate that this streamlined approach supports real-time performance, scales well in map size and trajectory duration, and effectively broadens the capabilities of VSLAM for long-duration deployments to large environments.

Loop Closure from Two Views: Revisiting PGO for Scalable Trajectory Estimation through Monocular Priors

TL;DR

We address scalability in Visual SLAM by removing dense 3D reconstruction and leveraging two-view loop closures to drive pose-graph optimization. The proposed 2GO framework constructs scale-free and absolute LC-Edges between sparse keyframes using state-of-the-art image matching and monocular priors, and optimizes them with a GTSAM-based PGO backend. Across KITTI, VBR, and a long real-world robot sequence, 2GO delivers real-time performance with significantly smaller maps and often superior trajectory accuracy compared with BA-based baselines. This map-free approach broadens the applicability of VSLAM to long-duration, large-scale deployments where dense reconstruction is unnecessary, while maintaining competitive precision.

Abstract

(Visual) Simultaneous Localization and Mapping (SLAM) remains a fundamental challenge in enabling autonomous systems to navigate and understand large-scale environments. Traditional SLAM approaches struggle to balance efficiency and accuracy, particularly in large-scale settings where extensive computational resources are required for scene reconstruction and Bundle Adjustment (BA). However, this scene reconstruction, in the form of sparse pointclouds of visual landmarks, is often only used within the SLAM system because navigation and planning methods require different map representations. In this work, we therefore investigate a more scalable Visual SLAM (VSLAM) approach without reconstruction, mainly based on approaches for two-view loop closures. By restricting the map to a sparse keyframed pose graph without dense geometry representations, our `2GO' system achieves efficient optimization with competitive absolute trajectory accuracy. In particular, we find that recent advancements in image matching and monocular depth priors enable very accurate trajectory optimization without BA. We conduct extensive experiments on diverse datasets, including large-scale scenarios, and provide a detailed analysis of the trade-offs between runtime, accuracy, and map size. Our results demonstrate that this streamlined approach supports real-time performance, scales well in map size and trajectory duration, and effectively broadens the capabilities of VSLAM for long-duration deployments to large environments.

Paper Structure

This paper contains 20 sections, 7 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Top: 2GO, our proposed system, refines a 1.1hr, 2.7km trajectory collected from a quadrupedal robot, overlaid on a 3D reconstruction of the surroundings. Absolute and scale-free loop-closure edges are visible in yellow and blue, respectively. Loop-closure edges with both types of constraints present are in red. Bottom: 2GO significantly improves the quality of the input trajectory, particularly in the Z-direction.
  • Figure 2: System Overview. From an input stream of odometry and images, we first downsample keyframes. Then we check for loop-closure candidates within previous keyframes and estimate the relative pose between candidate image pairs (metric and/or up-to-scale). Candidates are then filtered using geometric checks and then added as two-view edges to the pose-graph map, which is optimized with PGO. Marked in yellow is the main innovation of this paper.
  • Figure 3: Two-view Pose Estimation. Both CL and MV variants establish 2D-2D and 2D-3D feature correspondences for scale-free and absolute LC-Edges, respectively. The CL-variant employs separate deep learning models in a modular pipeline, while the MV-variant achieves this with a single two-view reconstruction model.
  • Figure 4: ciampino_train0 sequence: 2GO improves on VINS-Fusion odometry, reducing ATE from 131.33m to 22.84m.
  • Figure 5: Scale-free LC Edge Examples The top row contains samples from diag_train0, while the bottom row is from our real-world recording. Both LCs are constructed from image pairs with large baselines and contribute to trajectory refinement.