Visual Loop Closure Detection Through Deep Graph Consensus
Martin Büchner, Liza Dahiya, Simon Dorer, Vipul Ramtekkar, Kenji Nishimiya, Daniele Cattaneo, Abhinav Valada
TL;DR
LoopGNN tackles the challenge of robust visual loop closure detection by moving beyond pairwise frame comparisons to a neighborhood-based deep graph consensus. It constructs a maximum-similarity clique around a query keyframe, encodes per-frame features with NetVLAD, and applies a Graph Attention Network to learn loop-closure scores, followed by RANSAC-based geometric verification. Across TD2.0 and NCLT, LoopGNN outperforms traditional VPR and deep-loop-closure baselines in precision and recall, while consistently benefiting from various deep keypoint encoders and offering improved computational efficiency. The approach reduces the number of candidate loop closures that require expensive verification, enabling more reliable online SLAM, with code and data publicly released for future research.
Abstract
Visual loop closure detection traditionally relies on place recognition methods to retrieve candidate loops that are validated using computationally expensive RANSAC-based geometric verification. As false positive loop closures significantly degrade downstream pose graph estimates, verifying a large number of candidates in online simultaneous localization and mapping scenarios is constrained by limited time and compute resources. While most deep loop closure detection approaches only operate on pairs of keyframes, we relax this constraint by considering neighborhoods of multiple keyframes when detecting loops. In this work, we introduce LoopGNN, a graph neural network architecture that estimates loop closure consensus by leveraging cliques of visually similar keyframes retrieved through place recognition. By propagating deep feature encodings among nodes of the clique, our method yields high-precision estimates while maintaining high recall. Extensive experimental evaluations on the TartanDrive 2.0 and NCLT datasets demonstrate that LoopGNN outperforms traditional baselines. Additionally, an ablation study across various keypoint extractors demonstrates that our method is robust, regardless of the type of deep feature encodings used, and exhibits higher computational efficiency compared to classical geometric verification baselines. We release our code, supplementary material, and keyframe data at https://loopgnn.cs.uni-freiburg.de.
