Table of Contents
Fetching ...

TOPO-Bench: An Open-Source Topological Mapping Evaluation Framework with Quantifiable Perceptual Aliasing

Jiaming Wang, Diwen Liu, Jizhuo Chen, Harold Soh

TL;DR

This work formalizes topological consistency as the fundamental property of topological maps and shows that localization accuracy provides an efficient and interpretable surrogate metric, and proposes the first quantitative measure of dataset ambiguity to enable fair comparisons across environments.

Abstract

Topological mapping offers a compact and robust representation for navigation, but progress in the field is hindered by the lack of standardized evaluation metrics, datasets, and protocols. Existing systems are assessed using different environments and criteria, preventing fair and reproducible comparisons. Moreover, a key challenge - perceptual aliasing - remains under-quantified, despite its strong influence on system performance. We address these gaps by (1) formalizing topological consistency as the fundamental property of topological maps and showing that localization accuracy provides an efficient and interpretable surrogate metric, and (2) proposing the first quantitative measure of dataset ambiguity to enable fair comparisons across environments. To support this protocol, we curate a diverse benchmark dataset with calibrated ambiguity levels, implement and release deep-learned baseline systems, and evaluate them alongside classical methods. Our experiments and analysis yield new insights into the limitations of current approaches under perceptual aliasing. All datasets, baselines, and evaluation tools are fully open-sourced to foster consistent and reproducible research in topological mapping.

TOPO-Bench: An Open-Source Topological Mapping Evaluation Framework with Quantifiable Perceptual Aliasing

TL;DR

This work formalizes topological consistency as the fundamental property of topological maps and shows that localization accuracy provides an efficient and interpretable surrogate metric, and proposes the first quantitative measure of dataset ambiguity to enable fair comparisons across environments.

Abstract

Topological mapping offers a compact and robust representation for navigation, but progress in the field is hindered by the lack of standardized evaluation metrics, datasets, and protocols. Existing systems are assessed using different environments and criteria, preventing fair and reproducible comparisons. Moreover, a key challenge - perceptual aliasing - remains under-quantified, despite its strong influence on system performance. We address these gaps by (1) formalizing topological consistency as the fundamental property of topological maps and showing that localization accuracy provides an efficient and interpretable surrogate metric, and (2) proposing the first quantitative measure of dataset ambiguity to enable fair comparisons across environments. To support this protocol, we curate a diverse benchmark dataset with calibrated ambiguity levels, implement and release deep-learned baseline systems, and evaluate them alongside classical methods. Our experiments and analysis yield new insights into the limitations of current approaches under perceptual aliasing. All datasets, baselines, and evaluation tools are fully open-sourced to foster consistent and reproducible research in topological mapping.

Paper Structure

This paper contains 22 sections, 11 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Illustration of topological consistency. Nodes $a$, $b$, $c$, and $d$ correspond to physical locations $A$, $B$, $C$, and $D$ in the environment. Location $A$ is route-close to $B$ and $C$, but far from $D$. In the constructed topological map, the path between $a$ and $d$ violates Edge Precision, while the missing edge between $a$ and $b$ violates Edge Recall. The connection between $a$ and $c$ correctly preserves topological consistency.
  • Figure 2: Examples of map, test, and ambiguous cases across datasets. Top row: A+P case from OpenLORIS shi2019openlorisscene, where the test differs from the map due to a time-of-day change, and the ambiguous (false-positive) image appears visually close to the target. Bottom left: P.O. case from RobotCar RobotCarDatasetIJRR, showing a map–test pair of the same location with the building façade temporarily covered during construction. Bottom right: A.O. case from RELLIS-3D jiang2020rellis3d, where the ambiguous resembles the test except for differences in vegetation type. See Sec. \ref{['sec:ambiguity-quantification']} for case definitions.
  • Figure 3: Examples of evaluation scenarios with quantified ambiguity. (a) Ambiguous + Positive (A+P): The test sequence revisits a mapped region, but at least one distractor subsequence appears nearly as similar as the true match, making localization uncertain. (b) Positive Only (P.O.): The test sequence revisits a mapped region and the true subsequence clearly dominates all distractors, resulting in unambiguous localization. (c) Ambiguous Only (A.O.): The test sequence lies in a novel, unmapped region, yet one or more mapped subsequences spuriously appear visually similar, producing false matches. Omitting the intermediate trajectory (shown in magenta) yields the kidnapped-robot variant; otherwise, the scenario corresponds to a classical loop closure.
  • Figure 4: Localization accuracy for A+P, P.O., and A.O. test cases, and BLA, plotted against the decision threshold for all baseline methods. For SM-Med, GM, and PBU the threshold is directly tied to detection scores, while for FabMap and RatSLAM it is defined by more complex scoring functions. Results show that all baseline methods perform poorly in the A+P test case, highlighting the need for benchmarks that explicitly quantify ambiguity and for more sophisticated approaches to resolve perceptual aliasing. In contrast, methods generally perform well in the P.O. test case at low threshold values, since in the absence of perceptual aliasing this setting mainly tests the system’s ability to detect candidates (analogous to recall). For the A.O. test case, methods achieve high accuracy in the high-threshold regime, as the correct outcome is consistently to reject. Finally, the BLA case provides a balanced view across the three scenarios. Overall, these results suggest that current state-of-the-art methods cannot adequately handle scenarios where perceptual aliasing is present.