DYNEMO-SLAM: Dynamic Entity and Motion-Aware 3D Scene Graph SLAM
Marco Giberna, Muhammad Shaheer, Miguel Fernandez-Cortizas, Jose Andres Millan-Romera, Jose Luis Sanchez-Lopez, Holger Voos
TL;DR
This work tackles SLAM in highly dynamic environments by coupling 3D scene graphs with a unified backend that jointly optimizes robot trajectory, static structure, and dynamic entity poses. It introduces semantic motion priors, a dynamic keyframe policy, and dynamic entity-aware loop closure to maintain robustness, while using a multi-sensor front-end to detect and encode dynamic elements. Experimental results show substantial improvements in localization accuracy (ATE) over baselines and real-time performance, validating the benefits of treating dynamic entities as persistent, semantically-anchored landmarks. The approach enhances scene representation and loop-closure reliability in cluttered, dynamic settings, with clear pathways for marker-free perception and richer semantic layering in future work.
Abstract
Robots operating in dynamic environments face significant challenges due to the presence of moving agents and displaced objects. Traditional SLAM systems typically assume a static world or treat dynamic as outliers, discarding their information to preserve map consistency. As a result, they cannot exploit dynamic entities as persistent landmarks, do not model and exploit their motion over time, and therefore quickly degrade in highly cluttered environments with few reliable static features. This paper presents a novel 3D scene graph-based SLAM framework that addresses the challenge of modeling and estimating the pose of dynamic entities into the SLAM backend. Our framework incorporates semantic motion priors and dynamic entity-aware constraints to jointly optimize the robot trajectory, dynamic entity poses, and the surrounding environment structure within a unified graph formulation. In parallel, a dynamic keyframe selection policy and a semantic loop-closure prefiltering step enable the system to remain robust and effective in highly dynamic environments by continuously adapting to scene changes and filtering inconsistent observations. The simulation and real-world experimental results show a 49.97% reduction in ATE compared to the baseline method employed, demonstrating the effectiveness of incorporating dynamic entities and estimating their poses for improved robustness and richer scene representation in complex scenarios while maintaining real-time performance.
