From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking
Xiaohao Xu, Tianyi Zhang, Sibo Wang, Xiang Li, Yongqi Chen, Ye Li, Bhiksha Raj, Matthew Johnson-Roberson, Xiaonan Huang
TL;DR
This work addresses the robustness gap of RGB-D SLAM in unstructured environments by introducing a comprehensive perturbation taxonomy and a customizable noisy data synthesis pipeline that converts Perfect World simulations into Noisy World scenarios. It instantiates the Noisy-Replica benchmark from Replica indoor scenes, producing 1,000 long RGB-D sequences across 124 perturbation settings to evaluate neural (NeRF-based and Gaussian Splatting) and classical SLAM models under varied disturbances, formalizing SLAM under perturbations via the posterior $p(\mathbf{m}, \mathbf{x}_{1:t} | \mathbf{z}_{1:t})$. The study reveals that many advanced SLAM models remain vulnerable to perturbations, especially under dynamic and mixed perturbations, highlighting the need for robustness-focused design and evaluation. By providing a scalable, standardized evaluation framework and rich perturbation settings, the work significantly accelerates the development of robust embodied SLAM systems for real-world deployment, with potential extensions to broader modalities and active/ multi-agent scenarios.
Abstract
Embodied agents require robust navigation systems to operate in unstructured environments, making the robustness of Simultaneous Localization and Mapping (SLAM) models critical to embodied agent autonomy. While real-world datasets are invaluable, simulation-based benchmarks offer a scalable approach for robustness evaluations. However, the creation of a challenging and controllable noisy world with diverse perturbations remains under-explored. To this end, we propose a novel, customizable pipeline for noisy data synthesis, aimed at assessing the resilience of multi-modal SLAM models against various perturbations. The pipeline comprises a comprehensive taxonomy of sensor and motion perturbations for embodied multi-modal (specifically RGB-D) sensing, categorized by their sources and propagation order, allowing for procedural composition. We also provide a toolbox for synthesizing these perturbations, enabling the transformation of clean environments into challenging noisy simulations. Utilizing the pipeline, we instantiate the large-scale Noisy-Replica benchmark, which includes diverse perturbation types, to evaluate the risk tolerance of existing advanced RGB-D SLAM models. Our extensive analysis uncovers the susceptibilities of both neural (NeRF and Gaussian Splatting -based) and non-neural SLAM models to disturbances, despite their demonstrated accuracy in standard benchmarks. Our code is publicly available at https://github.com/Xiaohao-Xu/SLAM-under-Perturbation.
