Table of Contents
Fetching ...

Look Ma, No Ground Truth! Ground-Truth-Free Tuning of Structure from Motion and Visual SLAM

Alejandro Fontan, Javier Civera, Tobias Fischer, Michael Milford

TL;DR

The paper tackles the challenge of evaluating SfM and Visual SLAM without relying on costly ground-truth data by introducing Ground-Truth-Free Absolute Trajectory Error (GTF ATE). It develops a Jacobian-based sensitivity framework and a sensitivity-sampling procedure that perturb input images with Gaussian noise, enabling end-to-end GT-free evaluation and hyperparameter tuning. Across multiple datasets and pipelines (GLOMAP for SfM and DROID-SLAM for VSLAM), the GT-free metric strongly correlates with standard GT-based ATE, achieving comparable improvements in tuning and demonstrating substantial potential for scalable, self-supervised optimization. This GT-free approach promises to broaden data sources, support online tuning, and drive data-driven advancement in real-world localization systems. The work lays a foundation for self-supervised, online refinement of SLAM systems with real-world applicability beyond curated benchmarks.

Abstract

Evaluation is critical to both developing and tuning Structure from Motion (SfM) and Visual SLAM (VSLAM) systems, but is universally reliant on high-quality geometric ground truth -- a resource that is not only costly and time-intensive but, in many cases, entirely unobtainable. This dependency on ground truth restricts SfM and SLAM applications across diverse environments and limits scalability to real-world scenarios. In this work, we propose a novel ground-truth-free (GTF) evaluation methodology that eliminates the need for geometric ground truth, instead using sensitivity estimation via sampling from both original and noisy versions of input images. Our approach shows strong correlation with traditional ground-truth-based benchmarks and supports GTF hyperparameter tuning. Removing the need for ground truth opens up new opportunities to leverage a much larger number of dataset sources, and for self-supervised and online tuning, with the potential for a data-driven breakthrough analogous to what has occurred in generative AI.

Look Ma, No Ground Truth! Ground-Truth-Free Tuning of Structure from Motion and Visual SLAM

TL;DR

The paper tackles the challenge of evaluating SfM and Visual SLAM without relying on costly ground-truth data by introducing Ground-Truth-Free Absolute Trajectory Error (GTF ATE). It develops a Jacobian-based sensitivity framework and a sensitivity-sampling procedure that perturb input images with Gaussian noise, enabling end-to-end GT-free evaluation and hyperparameter tuning. Across multiple datasets and pipelines (GLOMAP for SfM and DROID-SLAM for VSLAM), the GT-free metric strongly correlates with standard GT-based ATE, achieving comparable improvements in tuning and demonstrating substantial potential for scalable, self-supervised optimization. This GT-free approach promises to broaden data sources, support online tuning, and drive data-driven advancement in real-world localization systems. The work lays a foundation for self-supervised, online refinement of SLAM systems with real-world applicability beyond curated benchmarks.

Abstract

Evaluation is critical to both developing and tuning Structure from Motion (SfM) and Visual SLAM (VSLAM) systems, but is universally reliant on high-quality geometric ground truth -- a resource that is not only costly and time-intensive but, in many cases, entirely unobtainable. This dependency on ground truth restricts SfM and SLAM applications across diverse environments and limits scalability to real-world scenarios. In this work, we propose a novel ground-truth-free (GTF) evaluation methodology that eliminates the need for geometric ground truth, instead using sensitivity estimation via sampling from both original and noisy versions of input images. Our approach shows strong correlation with traditional ground-truth-based benchmarks and supports GTF hyperparameter tuning. Removing the need for ground truth opens up new opportunities to leverage a much larger number of dataset sources, and for self-supervised and online tuning, with the potential for a data-driven breakthrough analogous to what has occurred in generative AI.

Paper Structure

This paper contains 23 sections, 7 equations, 10 figures, 3 tables, 1 algorithm.

Figures (10)

  • Figure 1: Illustration of ground-truth-free tuning in GLOMAP. The green line fits Absolute Trajectory Error (ATE) results of GLOMAP as we vary one of its hyperparameters, specifically the maximum reprojection error for inliers in the Bundle Adjustment (Max. BA $e_r$) in radians. Note that, while the Max. BA $e_r$default value in GLOMAP is $10^{-2}$, leading to an ATE of $1.3$mm, the optimal one for this particular sequence is $\simeq 10^{-3}$, for which the ATE improvement is $\simeq 40\%$, reaching $0.8$mm. Now look at our proposed GTF ATE curve in pink, which without ground truth, is able to mimic the relative GLOMAP performance for different values of the hyperparameter, and hence also discerning its optimal setup.
  • Figure 2: Benchmarking Structure-from-Motion and Visual SLAM Without Ground Truth. The figure showcases the potential capabilities of our Ground-Truth-Free Absolute Trajectory Error (GTF ATE), enabling formative feedback, live feedback, and comprehensive performance evaluation and benchmarking of SfM and VSLAM, all without relying on actual ground truth data.
  • Figure 3: Experimental assessment of GLOMAP's linearity. Our ground-truth-free tuning assumes a high degree of linearity in SfM/VSLAM pipelines. To assess this hypothesis, we run GLOMAP pan2024global$k_\Delta$ times for images perturbed with noises of different variances $\Delta \sigma$. Note, in the fit to the first values that we draw, how the ATE shows a high degree of linearity in its evolution.
  • Figure 4: Green Left-Y-axis shows the ATE computed using ground truth.Pink Right-Y-axis shows our GTF ATE.$\bullet$Blue dots indicate the ATE of GLOMAP operating with nominal parameters.$\bullet$Minimum ATE achieved when fine-tuning with ground truth.$\bullet$Minimum ATE achieved using our GTF ATE, without requiring ground truth data.
  • Figure 5: Green Left-Y-axis shows the ATE computed using ground truth.Pink Right-Y-axis shows our GTF ATE.$\bullet$Blue dots indicate the ATE of GLOMAP operating with nominal parameters.$\bullet$Minimum ATE achieved when fine-tuning with ground truth.$\bullet$Minimum ATE achieved using our GTF ATE, without requiring ground truth data.
  • ...and 5 more figures