Table of Contents
Fetching ...

Visual Loop Closure Detection Through Deep Graph Consensus

Martin Büchner, Liza Dahiya, Simon Dorer, Vipul Ramtekkar, Kenji Nishimiya, Daniele Cattaneo, Abhinav Valada

TL;DR

LoopGNN tackles the challenge of robust visual loop closure detection by moving beyond pairwise frame comparisons to a neighborhood-based deep graph consensus. It constructs a maximum-similarity clique around a query keyframe, encodes per-frame features with NetVLAD, and applies a Graph Attention Network to learn loop-closure scores, followed by RANSAC-based geometric verification. Across TD2.0 and NCLT, LoopGNN outperforms traditional VPR and deep-loop-closure baselines in precision and recall, while consistently benefiting from various deep keypoint encoders and offering improved computational efficiency. The approach reduces the number of candidate loop closures that require expensive verification, enabling more reliable online SLAM, with code and data publicly released for future research.

Abstract

Visual loop closure detection traditionally relies on place recognition methods to retrieve candidate loops that are validated using computationally expensive RANSAC-based geometric verification. As false positive loop closures significantly degrade downstream pose graph estimates, verifying a large number of candidates in online simultaneous localization and mapping scenarios is constrained by limited time and compute resources. While most deep loop closure detection approaches only operate on pairs of keyframes, we relax this constraint by considering neighborhoods of multiple keyframes when detecting loops. In this work, we introduce LoopGNN, a graph neural network architecture that estimates loop closure consensus by leveraging cliques of visually similar keyframes retrieved through place recognition. By propagating deep feature encodings among nodes of the clique, our method yields high-precision estimates while maintaining high recall. Extensive experimental evaluations on the TartanDrive 2.0 and NCLT datasets demonstrate that LoopGNN outperforms traditional baselines. Additionally, an ablation study across various keypoint extractors demonstrates that our method is robust, regardless of the type of deep feature encodings used, and exhibits higher computational efficiency compared to classical geometric verification baselines. We release our code, supplementary material, and keyframe data at https://loopgnn.cs.uni-freiburg.de.

Visual Loop Closure Detection Through Deep Graph Consensus

TL;DR

LoopGNN tackles the challenge of robust visual loop closure detection by moving beyond pairwise frame comparisons to a neighborhood-based deep graph consensus. It constructs a maximum-similarity clique around a query keyframe, encodes per-frame features with NetVLAD, and applies a Graph Attention Network to learn loop-closure scores, followed by RANSAC-based geometric verification. Across TD2.0 and NCLT, LoopGNN outperforms traditional VPR and deep-loop-closure baselines in precision and recall, while consistently benefiting from various deep keypoint encoders and offering improved computational efficiency. The approach reduces the number of candidate loop closures that require expensive verification, enabling more reliable online SLAM, with code and data publicly released for future research.

Abstract

Visual loop closure detection traditionally relies on place recognition methods to retrieve candidate loops that are validated using computationally expensive RANSAC-based geometric verification. As false positive loop closures significantly degrade downstream pose graph estimates, verifying a large number of candidates in online simultaneous localization and mapping scenarios is constrained by limited time and compute resources. While most deep loop closure detection approaches only operate on pairs of keyframes, we relax this constraint by considering neighborhoods of multiple keyframes when detecting loops. In this work, we introduce LoopGNN, a graph neural network architecture that estimates loop closure consensus by leveraging cliques of visually similar keyframes retrieved through place recognition. By propagating deep feature encodings among nodes of the clique, our method yields high-precision estimates while maintaining high recall. Extensive experimental evaluations on the TartanDrive 2.0 and NCLT datasets demonstrate that LoopGNN outperforms traditional baselines. Additionally, an ablation study across various keypoint extractors demonstrates that our method is robust, regardless of the type of deep feature encodings used, and exhibits higher computational efficiency compared to classical geometric verification baselines. We release our code, supplementary material, and keyframe data at https://loopgnn.cs.uni-freiburg.de.

Paper Structure

This paper contains 17 sections, 6 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Given a query keyframe, the set of closest keyframes is retrieved through classical place recognition methods such as VLAD arandjelovic2013all. In this work, we introduce LoopGNN, a novel graph neural network that produces precise loop closure estimates via deep graph consensus among the obtained closest set of keyframes. We find that our method scales across various types of deep keypoint encodings and outperforms baselines relying on only pairs of keyframes to identify loop closures.
  • Figure 2: Overview of our LoopGNN approach: We create keyframes from robot trajectories and utilize a deep keypoint extractor such as XFeat potje2024xfeat to obtain keypoints for each image. Next, we fit a VLAD-based place recognition model allowing robust and fast retrieval of similar frames given a query frame (left). In the following, given a query frame, we independently encode the keypoint descriptors of all frames (query and retrieved ones) using a NetVLAD layer and construct a neighborhood graph. We feed this attributed graph into a graph attention network in order to produce a deep consensus regarding loop closures among keyframes of the neighborhood (middle). Finally, we extract the set of highest-scoring edge-wise predictions of the network and validate pairs of frames using RANSAC-based geometric verification (right).
  • Figure 3: Maximum-similarity clique construction: Given robot trajectory, we extract keyframes at fixed distance intervals. Given a query keyframe, we retrieve the set of top-k closest keyframes using VLAD-based place recognition. We impose interconnected edges and take the obtained graph $\mathcal{G}(q)=(V_q, E_q)$ as input to our LoopGNN pipeline. At the output, we only consider the edge scores of the query edges denoted here.
  • Figure 4: A selection of various keyframes of the TD2 sequence figure_8_2023-09-13-17-24-26 underlining the degree of perceptual aliases contained.
  • Figure 5: Qualitative results comparing the loop closure predictions of XFeat potje2024xfeat + RANSAC (left) against LoopGNN (right) under similar VLAD neighborhood retrieval of 1% at maximum recall at 100% precision. Red dots represent false negative loop closures, while green dots represent true positive loop closures. The underlying scene is figure_8_2023-09-13-17-24-26 of the TartanDrive 2.0 dataset.
  • ...and 1 more figures