A Framework for Reproducible Benchmarking and Performance Diagnosis of SLAM Systems
Nikola Radulov, Yuhao Zhang, Mihai Bujanca, Ruiqi Ye, Mikel Luján
TL;DR
SLAMFuse tackles the challenge of reproducible, cross-sensor benchmarking for SLAM by providing a containerized framework that decouples algorithms and datasets via Docker volumes, enabling plug-and-play evaluation across visual and LiDAR modalities. It introduces a data fuzzing mechanism to stress-test SLAM robustness and a performance-diagnostics toolkit for frame-level failure analysis, all within a reproducible environment. Through extensive experiments on KITTI, Newer College, DARPA Subterranean, TUM/UZH-FPV, and other datasets, the framework demonstrates modality-specific strengths, failure modes, and the impact of data perturbations on trajectory accuracy and loop closures. The work emphasizes reproducibility and portability, offering ready-to-use dockerized algorithms, datasets, and guidelines, with a measured Docker overhead of roughly 5–10%, enabling reliable benchmarking across platforms and future algorithm integration.
Abstract
We propose SLAMFuse, an open-source SLAM benchmarking framework that provides consistent crossplatform environments for evaluating multi-modal SLAM algorithms, along with tools for data fuzzing, failure detection, and diagnosis across different datasets. Our framework introduces a fuzzing mechanism to test the resilience of SLAM algorithms against dataset perturbations. This enables the assessment of pose estimation accuracy under varying conditions and identifies critical perturbation thresholds. SLAMFuse improves diagnostics with failure detection and analysis tools, examining algorithm behaviour against dataset characteristics. SLAMFuse uses Docker to ensure reproducible testing conditions across diverse datasets and systems by streamlining dependency management. Emphasizing the importance of reproducibility and introducing advanced tools for algorithm evaluation and performance diagnosis, our work sets a new precedent for reliable benchmarking of SLAM systems. We provide ready-to-use docker compatible versions of the algorithms and datasets used in the experiments, together with guidelines for integrating and benchmarking new algorithms. Code is available at https://github.com/nikolaradulov/slamfuse
