Table of Contents
Fetching ...

A Framework for Reproducible Benchmarking and Performance Diagnosis of SLAM Systems

Nikola Radulov, Yuhao Zhang, Mihai Bujanca, Ruiqi Ye, Mikel Luján

TL;DR

SLAMFuse tackles the challenge of reproducible, cross-sensor benchmarking for SLAM by providing a containerized framework that decouples algorithms and datasets via Docker volumes, enabling plug-and-play evaluation across visual and LiDAR modalities. It introduces a data fuzzing mechanism to stress-test SLAM robustness and a performance-diagnostics toolkit for frame-level failure analysis, all within a reproducible environment. Through extensive experiments on KITTI, Newer College, DARPA Subterranean, TUM/UZH-FPV, and other datasets, the framework demonstrates modality-specific strengths, failure modes, and the impact of data perturbations on trajectory accuracy and loop closures. The work emphasizes reproducibility and portability, offering ready-to-use dockerized algorithms, datasets, and guidelines, with a measured Docker overhead of roughly 5–10%, enabling reliable benchmarking across platforms and future algorithm integration.

Abstract

We propose SLAMFuse, an open-source SLAM benchmarking framework that provides consistent crossplatform environments for evaluating multi-modal SLAM algorithms, along with tools for data fuzzing, failure detection, and diagnosis across different datasets. Our framework introduces a fuzzing mechanism to test the resilience of SLAM algorithms against dataset perturbations. This enables the assessment of pose estimation accuracy under varying conditions and identifies critical perturbation thresholds. SLAMFuse improves diagnostics with failure detection and analysis tools, examining algorithm behaviour against dataset characteristics. SLAMFuse uses Docker to ensure reproducible testing conditions across diverse datasets and systems by streamlining dependency management. Emphasizing the importance of reproducibility and introducing advanced tools for algorithm evaluation and performance diagnosis, our work sets a new precedent for reliable benchmarking of SLAM systems. We provide ready-to-use docker compatible versions of the algorithms and datasets used in the experiments, together with guidelines for integrating and benchmarking new algorithms. Code is available at https://github.com/nikolaradulov/slamfuse

A Framework for Reproducible Benchmarking and Performance Diagnosis of SLAM Systems

TL;DR

SLAMFuse tackles the challenge of reproducible, cross-sensor benchmarking for SLAM by providing a containerized framework that decouples algorithms and datasets via Docker volumes, enabling plug-and-play evaluation across visual and LiDAR modalities. It introduces a data fuzzing mechanism to stress-test SLAM robustness and a performance-diagnostics toolkit for frame-level failure analysis, all within a reproducible environment. Through extensive experiments on KITTI, Newer College, DARPA Subterranean, TUM/UZH-FPV, and other datasets, the framework demonstrates modality-specific strengths, failure modes, and the impact of data perturbations on trajectory accuracy and loop closures. The work emphasizes reproducibility and portability, offering ready-to-use dockerized algorithms, datasets, and guidelines, with a measured Docker overhead of roughly 5–10%, enabling reliable benchmarking across platforms and future algorithm integration.

Abstract

We propose SLAMFuse, an open-source SLAM benchmarking framework that provides consistent crossplatform environments for evaluating multi-modal SLAM algorithms, along with tools for data fuzzing, failure detection, and diagnosis across different datasets. Our framework introduces a fuzzing mechanism to test the resilience of SLAM algorithms against dataset perturbations. This enables the assessment of pose estimation accuracy under varying conditions and identifies critical perturbation thresholds. SLAMFuse improves diagnostics with failure detection and analysis tools, examining algorithm behaviour against dataset characteristics. SLAMFuse uses Docker to ensure reproducible testing conditions across diverse datasets and systems by streamlining dependency management. Emphasizing the importance of reproducibility and introducing advanced tools for algorithm evaluation and performance diagnosis, our work sets a new precedent for reliable benchmarking of SLAM systems. We provide ready-to-use docker compatible versions of the algorithms and datasets used in the experiments, together with guidelines for integrating and benchmarking new algorithms. Code is available at https://github.com/nikolaradulov/slamfuse
Paper Structure (26 sections, 1 equation, 8 figures, 5 tables)

This paper contains 26 sections, 1 equation, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Architecture the SLAMFuse benchmarking framework: The source code of the algorithm and raw dataset are encapsulated in separate Docker volumes, which are mounted into the Docker container of SLAMFuse. SLAMFuse then takes frames from the dataset volume, applies perturbations, and passes them to the algorithm. Different metrics are computed simultaneously for analysis.
  • Figure 2: Comparison of ATE RMSE on KITTI sequences.
  • Figure 3: (a) Variations in the RPE of ORB-SLAM3 when applied to the Newer College dataset and (b) associated frames with high RPE where most close points are extracted from a dynamic person.
  • Figure 4: Brightness perturbation --- ORB-SLAM3 (top), LSD-SLAM (middle), ElasticFusion (bottom) on TUM Freiburg 1 xyz (left) and TUM Freiburg 1 Desk (right).
  • Figure 5: Contrast perturbation --- ORB-SLAM3 (top), LSD-SLAM (middle), ElasticFusion (bottom) on TUM Freiburg 1 xyz (left) and TUM Freiburg 1 Desk (right).
  • ...and 3 more figures