Loop Closure from Two Views: Revisiting PGO for Scalable Trajectory Estimation through Monocular Priors
Tian Yi Lim, Boyang Sun, Marc Pollefeys, Hermann Blum
TL;DR
We address scalability in Visual SLAM by removing dense 3D reconstruction and leveraging two-view loop closures to drive pose-graph optimization. The proposed 2GO framework constructs scale-free and absolute LC-Edges between sparse keyframes using state-of-the-art image matching and monocular priors, and optimizes them with a GTSAM-based PGO backend. Across KITTI, VBR, and a long real-world robot sequence, 2GO delivers real-time performance with significantly smaller maps and often superior trajectory accuracy compared with BA-based baselines. This map-free approach broadens the applicability of VSLAM to long-duration, large-scale deployments where dense reconstruction is unnecessary, while maintaining competitive precision.
Abstract
(Visual) Simultaneous Localization and Mapping (SLAM) remains a fundamental challenge in enabling autonomous systems to navigate and understand large-scale environments. Traditional SLAM approaches struggle to balance efficiency and accuracy, particularly in large-scale settings where extensive computational resources are required for scene reconstruction and Bundle Adjustment (BA). However, this scene reconstruction, in the form of sparse pointclouds of visual landmarks, is often only used within the SLAM system because navigation and planning methods require different map representations. In this work, we therefore investigate a more scalable Visual SLAM (VSLAM) approach without reconstruction, mainly based on approaches for two-view loop closures. By restricting the map to a sparse keyframed pose graph without dense geometry representations, our `2GO' system achieves efficient optimization with competitive absolute trajectory accuracy. In particular, we find that recent advancements in image matching and monocular depth priors enable very accurate trajectory optimization without BA. We conduct extensive experiments on diverse datasets, including large-scale scenarios, and provide a detailed analysis of the trade-offs between runtime, accuracy, and map size. Our results demonstrate that this streamlined approach supports real-time performance, scales well in map size and trajectory duration, and effectively broadens the capabilities of VSLAM for long-duration deployments to large environments.
