Table of Contents
Fetching ...

DYNEMO-SLAM: Dynamic Entity and Motion-Aware 3D Scene Graph SLAM

Marco Giberna, Muhammad Shaheer, Miguel Fernandez-Cortizas, Jose Andres Millan-Romera, Jose Luis Sanchez-Lopez, Holger Voos

TL;DR

This work tackles SLAM in highly dynamic environments by coupling 3D scene graphs with a unified backend that jointly optimizes robot trajectory, static structure, and dynamic entity poses. It introduces semantic motion priors, a dynamic keyframe policy, and dynamic entity-aware loop closure to maintain robustness, while using a multi-sensor front-end to detect and encode dynamic elements. Experimental results show substantial improvements in localization accuracy (ATE) over baselines and real-time performance, validating the benefits of treating dynamic entities as persistent, semantically-anchored landmarks. The approach enhances scene representation and loop-closure reliability in cluttered, dynamic settings, with clear pathways for marker-free perception and richer semantic layering in future work.

Abstract

Robots operating in dynamic environments face significant challenges due to the presence of moving agents and displaced objects. Traditional SLAM systems typically assume a static world or treat dynamic as outliers, discarding their information to preserve map consistency. As a result, they cannot exploit dynamic entities as persistent landmarks, do not model and exploit their motion over time, and therefore quickly degrade in highly cluttered environments with few reliable static features. This paper presents a novel 3D scene graph-based SLAM framework that addresses the challenge of modeling and estimating the pose of dynamic entities into the SLAM backend. Our framework incorporates semantic motion priors and dynamic entity-aware constraints to jointly optimize the robot trajectory, dynamic entity poses, and the surrounding environment structure within a unified graph formulation. In parallel, a dynamic keyframe selection policy and a semantic loop-closure prefiltering step enable the system to remain robust and effective in highly dynamic environments by continuously adapting to scene changes and filtering inconsistent observations. The simulation and real-world experimental results show a 49.97% reduction in ATE compared to the baseline method employed, demonstrating the effectiveness of incorporating dynamic entities and estimating their poses for improved robustness and richer scene representation in complex scenarios while maintaining real-time performance.

DYNEMO-SLAM: Dynamic Entity and Motion-Aware 3D Scene Graph SLAM

TL;DR

This work tackles SLAM in highly dynamic environments by coupling 3D scene graphs with a unified backend that jointly optimizes robot trajectory, static structure, and dynamic entity poses. It introduces semantic motion priors, a dynamic keyframe policy, and dynamic entity-aware loop closure to maintain robustness, while using a multi-sensor front-end to detect and encode dynamic elements. Experimental results show substantial improvements in localization accuracy (ATE) over baselines and real-time performance, validating the benefits of treating dynamic entities as persistent, semantically-anchored landmarks. The approach enhances scene representation and loop-closure reliability in cluttered, dynamic settings, with clear pathways for marker-free perception and richer semantic layering in future work.

Abstract

Robots operating in dynamic environments face significant challenges due to the presence of moving agents and displaced objects. Traditional SLAM systems typically assume a static world or treat dynamic as outliers, discarding their information to preserve map consistency. As a result, they cannot exploit dynamic entities as persistent landmarks, do not model and exploit their motion over time, and therefore quickly degrade in highly cluttered environments with few reliable static features. This paper presents a novel 3D scene graph-based SLAM framework that addresses the challenge of modeling and estimating the pose of dynamic entities into the SLAM backend. Our framework incorporates semantic motion priors and dynamic entity-aware constraints to jointly optimize the robot trajectory, dynamic entity poses, and the surrounding environment structure within a unified graph formulation. In parallel, a dynamic keyframe selection policy and a semantic loop-closure prefiltering step enable the system to remain robust and effective in highly dynamic environments by continuously adapting to scene changes and filtering inconsistent observations. The simulation and real-world experimental results show a 49.97% reduction in ATE compared to the baseline method employed, demonstrating the effectiveness of incorporating dynamic entities and estimating their poses for improved robustness and richer scene representation in complex scenarios while maintaining real-time performance.

Paper Structure

This paper contains 17 sections, 10 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Hierarchical SLAM factor graph integrating dynamic entities (a walking human and a displaced chair) with keyframes, entities, and floor. Image frames show selected keyframes.
  • Figure 2: System Architecture. The proposed system processes synchronized RGB images, LiDAR scans, and odometry data to build a SLAM graph. The front-end (green) performs entity detection, keyframe selection, plane segmentation, entity-aware loop closure, and floor segmentation. The back-end (blue) registers detected entities and keyframes in the SLAM graph and relative constraints. The graph structure (right) organizes the environment hierarchically, associating keyframes with a detected agent (always observed in new position in the depicted scenario) and object (initially observed twice at the same position, then in a different location), planes and floor. We highlight our contributions (green and blue) with respect to the modules reused from the employed baseline bavle_s-graphs_2023 (light green, light blue and gray).
  • Figure 3: Entity Pose Estimation Error and Joint Optimization Effect. Estimated entity pose error, translational and rotational, with respect to the ground truth over time of Full method and the setups as presented in Table \ref{['tab:ablation_setups']} without timer in subfigure \ref{['fig:edge3_entity13_err']} and all implementing the timer in subfigure \ref{['fig:edge3_entity10_err_timer']} in the dataset S-MASO3.
  • Figure 4: Optimization time [s] over the number of nodes in the factor graph for each setup and the baseline, whereas data is available, across all employed datasets.