UP-SLAM: Adaptively Structured Gaussian SLAM with Uncertainty Prediction in Dynamic Environments
Wancai Zheng, Linlin Ou, Jiajie He, Libo Zhou, Xinyi Yu, Yan Wei
TL;DR
UP-SLAM tackles the challenge of robust real-time SLAM in dynamic environments by decoupling tracking from mapping into a parallel pipeline. It encodes the scene with an adaptive, probabilistic 3DGS representation using probabilistic anchors in a Bayesian octree, enabling automatic initialization and pruning of Gaussian primitives without manual thresholds. A training-free, multi-modal uncertainty estimator fuses residuals and DINO features to filter dynamic regions and refine motion masks, while a temporal encoding and lightweight MLPs enhance rendering and uncertainty prediction. The approach delivers state-of-the-art localization and rendering quality on dynamic datasets, while maintaining real-time performance and producing artifact-free static maps suitable for downstream tasks such as navigation and semantic understanding. These contributions offer a scalable, open-set-capable framework for robust dynamic-SLAM in real-world robotics applications.
Abstract
Recent 3D Gaussian Splatting (3DGS) techniques for Visual Simultaneous Localization and Mapping (SLAM) have significantly progressed in tracking and high-fidelity mapping. However, their sequential optimization framework and sensitivity to dynamic objects limit real-time performance and robustness in real-world scenarios. We present UP-SLAM, a real-time RGB-D SLAM system for dynamic environments that decouples tracking and mapping through a parallelized framework. A probabilistic octree is employed to manage Gaussian primitives adaptively, enabling efficient initialization and pruning without hand-crafted thresholds. To robustly filter dynamic regions during tracking, we propose a training-free uncertainty estimator that fuses multi-modal residuals to estimate per-pixel motion uncertainty, achieving open-set dynamic object handling without reliance on semantic labels. Furthermore, a temporal encoder is designed to enhance rendering quality. Concurrently, low-dimensional features are efficiently transformed via a shallow multilayer perceptron to construct DINO features, which are then employed to enrich the Gaussian field and improve the robustness of uncertainty prediction. Extensive experiments on multiple challenging datasets suggest that UP-SLAM outperforms state-of-the-art methods in both localization accuracy (by 59.8%) and rendering quality (by 4.57 dB PSNR), while maintaining real-time performance and producing reusable, artifact-free static maps in dynamic environments.The project: https://aczheng-cai.github.io/up_slam.github.io/
