Table of Contents
Fetching ...

From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking

Xiaohao Xu, Tianyi Zhang, Sibo Wang, Xiang Li, Yongqi Chen, Ye Li, Bhiksha Raj, Matthew Johnson-Roberson, Xiaonan Huang

TL;DR

This work addresses the robustness gap of RGB-D SLAM in unstructured environments by introducing a comprehensive perturbation taxonomy and a customizable noisy data synthesis pipeline that converts Perfect World simulations into Noisy World scenarios. It instantiates the Noisy-Replica benchmark from Replica indoor scenes, producing 1,000 long RGB-D sequences across 124 perturbation settings to evaluate neural (NeRF-based and Gaussian Splatting) and classical SLAM models under varied disturbances, formalizing SLAM under perturbations via the posterior $p(\mathbf{m}, \mathbf{x}_{1:t} | \mathbf{z}_{1:t})$. The study reveals that many advanced SLAM models remain vulnerable to perturbations, especially under dynamic and mixed perturbations, highlighting the need for robustness-focused design and evaluation. By providing a scalable, standardized evaluation framework and rich perturbation settings, the work significantly accelerates the development of robust embodied SLAM systems for real-world deployment, with potential extensions to broader modalities and active/ multi-agent scenarios.

Abstract

Embodied agents require robust navigation systems to operate in unstructured environments, making the robustness of Simultaneous Localization and Mapping (SLAM) models critical to embodied agent autonomy. While real-world datasets are invaluable, simulation-based benchmarks offer a scalable approach for robustness evaluations. However, the creation of a challenging and controllable noisy world with diverse perturbations remains under-explored. To this end, we propose a novel, customizable pipeline for noisy data synthesis, aimed at assessing the resilience of multi-modal SLAM models against various perturbations. The pipeline comprises a comprehensive taxonomy of sensor and motion perturbations for embodied multi-modal (specifically RGB-D) sensing, categorized by their sources and propagation order, allowing for procedural composition. We also provide a toolbox for synthesizing these perturbations, enabling the transformation of clean environments into challenging noisy simulations. Utilizing the pipeline, we instantiate the large-scale Noisy-Replica benchmark, which includes diverse perturbation types, to evaluate the risk tolerance of existing advanced RGB-D SLAM models. Our extensive analysis uncovers the susceptibilities of both neural (NeRF and Gaussian Splatting -based) and non-neural SLAM models to disturbances, despite their demonstrated accuracy in standard benchmarks. Our code is publicly available at https://github.com/Xiaohao-Xu/SLAM-under-Perturbation.

From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking

TL;DR

This work addresses the robustness gap of RGB-D SLAM in unstructured environments by introducing a comprehensive perturbation taxonomy and a customizable noisy data synthesis pipeline that converts Perfect World simulations into Noisy World scenarios. It instantiates the Noisy-Replica benchmark from Replica indoor scenes, producing 1,000 long RGB-D sequences across 124 perturbation settings to evaluate neural (NeRF-based and Gaussian Splatting) and classical SLAM models under varied disturbances, formalizing SLAM under perturbations via the posterior . The study reveals that many advanced SLAM models remain vulnerable to perturbations, especially under dynamic and mixed perturbations, highlighting the need for robustness-focused design and evaluation. By providing a scalable, standardized evaluation framework and rich perturbation settings, the work significantly accelerates the development of robust embodied SLAM systems for real-world deployment, with potential extensions to broader modalities and active/ multi-agent scenarios.

Abstract

Embodied agents require robust navigation systems to operate in unstructured environments, making the robustness of Simultaneous Localization and Mapping (SLAM) models critical to embodied agent autonomy. While real-world datasets are invaluable, simulation-based benchmarks offer a scalable approach for robustness evaluations. However, the creation of a challenging and controllable noisy world with diverse perturbations remains under-explored. To this end, we propose a novel, customizable pipeline for noisy data synthesis, aimed at assessing the resilience of multi-modal SLAM models against various perturbations. The pipeline comprises a comprehensive taxonomy of sensor and motion perturbations for embodied multi-modal (specifically RGB-D) sensing, categorized by their sources and propagation order, allowing for procedural composition. We also provide a toolbox for synthesizing these perturbations, enabling the transformation of clean environments into challenging noisy simulations. Utilizing the pipeline, we instantiate the large-scale Noisy-Replica benchmark, which includes diverse perturbation types, to evaluate the risk tolerance of existing advanced RGB-D SLAM models. Our extensive analysis uncovers the susceptibilities of both neural (NeRF and Gaussian Splatting -based) and non-neural SLAM models to disturbances, despite their demonstrated accuracy in standard benchmarks. Our code is publicly available at https://github.com/Xiaohao-Xu/SLAM-under-Perturbation.
Paper Structure (35 sections, 3 equations, 22 figures, 27 tables)

This paper contains 35 sections, 3 equations, 22 figures, 27 tables.

Figures (22)

  • Figure 1: Noisy data synthesis for robustness evaluation of embodied perception (specifically SLAM) models under perturbations. Our insight is to customize perturbations (red blocks) during conventional procedural (clean) data generation (blue blocks).
  • Figure 2: Taxonomy of perturbations for embodied RGB-D sensing. The sources of perturbations include: (a) sensor pose errors, (b) RGB and (c) depth imaging corruptions, and (d) RGB-D sensor synchronization errors. Dashed arrows illustrate the propagation order of individual perturbations.
  • Figure 3: Performance (measured by ATE$\downarrow$ (m)) of Neural SLAM models under diverse perturbations. For visualization, sequences resulting in failure are assigned an ATE value of 1.0.
  • Figure 4: Performance (measured by ATE$\downarrow$ (m), RPE$\downarrow$ (m), and SR$\uparrow$) of ORB-SLAM3 orbslam3 under diverse perturbations. For visualization, sequences resulting in failure are assigned an ATE/RPE value of 1.0 and a Success Rate of 0.
  • Figure A: Rendered RGB image streams under trajectory-level perturbations, including translation deviations (Translate), rotation deviations (Rotate), and the faster motion effect.
  • ...and 17 more figures