GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection

Jingwen Yu; Hanjing Ye; Jianhao Jiao; Ping Tan; Hong Zhang

GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection

Jingwen Yu, Hanjing Ye, Jianhao Jiao, Ping Tan, Hong Zhang

TL;DR

GV-Bench presents an open-source, modular benchmark to evaluate geometric verification for long-term loop closure detection by pairing a retrieval-based candidate generation stage with a RANSAC-based geometric verification stage using the fundamental matrix $\mathbf{F}$. It systematically compares six local feature matching methods (spanning handcrafted and learning-based approaches) across three long-term datasets and three conditional variation types, providing insights into when and why certain matchers excel. The study finds that learning-based sparse features (notably the SP+SG combination) generally offer robust verification performance, while dense/transformer-based methods like LoFTR achieve strong AP under challenging conditions; however, perceptual aliasing and ground-truth ambiguities remain significant challenges. The work emphasizes directions for future research, including multi-condition data augmentation, improved outlier rejection, and broader verification strategies, and provides an extensible framework to advance GV research in robust SLAM.

Abstract

Visual loop closure detection is an important module in visual simultaneous localization and mapping (SLAM), which associates current camera observation with previously visited places. Loop closures correct drifts in trajectory estimation to build a globally consistent map. However, a false loop closure can be fatal, so verification is required as an additional step to ensure robustness by rejecting the false positive loops. Geometric verification has been a well-acknowledged solution that leverages spatial clues provided by local feature matching to find true positives. Existing feature matching methods focus on homography and pose estimation in long-term visual localization, lacking references for geometric verification. To fill the gap, this paper proposes a unified benchmark targeting geometric verification of loop closure detection under long-term conditional variations. Furthermore, we evaluate six representative local feature matching methods (handcrafted and learning-based) under the benchmark, with in-depth analysis for limitations and future directions.

GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection

TL;DR

. It systematically compares six local feature matching methods (spanning handcrafted and learning-based approaches) across three long-term datasets and three conditional variation types, providing insights into when and why certain matchers excel. The study finds that learning-based sparse features (notably the SP+SG combination) generally offer robust verification performance, while dense/transformer-based methods like LoFTR achieve strong AP under challenging conditions; however, perceptual aliasing and ground-truth ambiguities remain significant challenges. The work emphasizes directions for future research, including multi-condition data augmentation, improved outlier rejection, and broader verification strategies, and provides an extensible framework to advance GV research in robust SLAM.

Abstract

Paper Structure (25 sections, 2 equations, 11 figures, 4 tables)

This paper contains 25 sections, 2 equations, 11 figures, 4 tables.

Introduction
RELATED WORKS
Loop Closure Verification
Local Feature Matching
Evaluation Methodology
Benchmark Introduction
Oxford Robotcar Dataset maddern20171
Nordland Dataset nordlanddataset
UAcampus Dataset liu2015keypoint
Geometric Verification
Experimental Results
Evaluation Metrics
Comparison over Conditional Variations
Comparsion over Runtime
Discussion
...and 10 more sections

Figures (11)

Figure 1: GV-Bench enables evaluation of geometric verification over long-term loop closure detection. The benchmark contains six sequences representing different conditional variations. The benchmark is built upon three datasets maddern20171nordlanddatasetliu2015keypoint. We carefully select and compare six representatives of local feature matching, where SP.+SG. exhibits the best performance on average. However, the unfilled and unbalanced radar chart indicates promising improvement. The metric used in the chart is max recall @100 precision, which is explained in Sec. \ref{['sec:metrics']}. The detailed construction of benchmark sequences is described in Sec. \ref{['sec:benchmark intro']}.
Figure 2: Loop Closure Detection consists of two stages: retrieval and verification. Potential loop closure pairs $\{q_{i}, c_{i,j}\}$ detected by the retrieval stage are sent for verification. Each pair of images is examined under geometric constraints provided by local feature matching. RANSAC filters the matched correspondences to find the best inliers, which is used as the probability in binary classification.
Figure 3: The pipeline of open-sourced benchmark consists of i) pre-process dataset, ii) randomly select query set (if the dataset does not provide it), iii) retrieve verification candidates for each query, iv) match queries with candidates. The dashed modules (Datasets, Retrieval Methods, and Local Feature Matching) are expendable in the open-sourced framework, enabling easy customization for research purposes (i.e., enlarging sequences, using other retrieval methods, and evaluating new feature matching methods.)
Figure 4: Example of local feature matching under long-term challenging conditional changes. In the above figures, we visualize LoFTR matches with vanilla RANSAC. The inliers (green lines) and outliers (red lines) are highlighted. The number of inliers of Fig. \ref{['sub:c']} and \ref{['sub:d']} are counter-intuitive because RANSAC fails when false matches are dominant (more detailed analysis are provided in Sec. \ref{['sec:exp_diss']}).
Figure 5: Precision-recall curve of the "Day" sequence. The marker annotates the maximum recall @100 precision (MR).
...and 6 more figures

GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection

TL;DR

Abstract

GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (11)