Table of Contents
Fetching ...

Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM

Baicheng Li, Zike Yan, Dong Wu, Hanqing Jiang, Hongbin Zha

TL;DR

The paper tackles robust neural SLAM in dynamic environments by framing memory as a continual-learning problem: a neural map f(\mathbf{x};\theta_M^t) memorizes static scene content, while an instance-aware classifier g(\mathbf{z};\theta_C^t) identifies dynamic objects. By enforcing photometric and geometric consistency only on static regions via volume rendering and SDF-based losses, and by updating the classifier online with replay buffers, the method achieves reliable tracking and mapping despite moving objects. Key contributions include the first dense NeRF-based SLAM in dynamic scenes, an online continual-learning approach for an instance-level motion status classifier, and a forgetting-based perspective that leverages dynamic content to adapt the neural scene representation. The approach demonstrates robustness and adaptability on challenging datasets, with practical implications for long-term, open-world robotics operating in dynamic environments.

Abstract

Simultaneous localization and mapping (SLAM) with implicit neural representations has received extensive attention due to the expressive representation power and the innovative paradigm of continual learning. However, deploying such a system within a dynamic environment has not been well-studied. Such challenges are intractable even for conventional algorithms since observations from different views with dynamic objects involved break the geometric and photometric consistency, whereas the consistency lays the foundation for joint optimizing the camera pose and the map parameters. In this paper, we best exploit the characteristics of continual learning and propose a novel SLAM framework for dynamic environments. While past efforts have been made to avoid catastrophic forgetting by exploiting an experience replay strategy, we view forgetting as a desirable characteristic. By adaptively controlling the replayed buffer, the ambiguity caused by moving objects can be easily alleviated through forgetting. We restrain the replay of the dynamic objects by introducing a continually-learned classifier for dynamic object identification. The iterative optimization of the neural map and the classifier notably improves the robustness of the SLAM system under a dynamic environment. Experiments on challenging datasets verify the effectiveness of the proposed framework.

Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM

TL;DR

The paper tackles robust neural SLAM in dynamic environments by framing memory as a continual-learning problem: a neural map f(\mathbf{x};\theta_M^t) memorizes static scene content, while an instance-aware classifier g(\mathbf{z};\theta_C^t) identifies dynamic objects. By enforcing photometric and geometric consistency only on static regions via volume rendering and SDF-based losses, and by updating the classifier online with replay buffers, the method achieves reliable tracking and mapping despite moving objects. Key contributions include the first dense NeRF-based SLAM in dynamic scenes, an online continual-learning approach for an instance-level motion status classifier, and a forgetting-based perspective that leverages dynamic content to adapt the neural scene representation. The approach demonstrates robustness and adaptability on challenging datasets, with practical implications for long-term, open-world robotics operating in dynamic environments.

Abstract

Simultaneous localization and mapping (SLAM) with implicit neural representations has received extensive attention due to the expressive representation power and the innovative paradigm of continual learning. However, deploying such a system within a dynamic environment has not been well-studied. Such challenges are intractable even for conventional algorithms since observations from different views with dynamic objects involved break the geometric and photometric consistency, whereas the consistency lays the foundation for joint optimizing the camera pose and the map parameters. In this paper, we best exploit the characteristics of continual learning and propose a novel SLAM framework for dynamic environments. While past efforts have been made to avoid catastrophic forgetting by exploiting an experience replay strategy, we view forgetting as a desirable characteristic. By adaptively controlling the replayed buffer, the ambiguity caused by moving objects can be easily alleviated through forgetting. We restrain the replay of the dynamic objects by introducing a continually-learned classifier for dynamic object identification. The iterative optimization of the neural map and the classifier notably improves the robustness of the SLAM system under a dynamic environment. Experiments on challenging datasets verify the effectiveness of the proposed framework.
Paper Structure (20 sections, 13 equations, 14 figures, 4 tables)

This paper contains 20 sections, 13 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: We introduce a continual learning based SLAM framework under challenging dynamic environments (top row). The proposed method jointly learns a classifier to alleviate the effects induced by the moving objects (middle row), and a neural map to memorize past observations as a neural radiance field (bottom row). The iterative optimization of pose, map, and classifier parameters forms a robust SLAM system that learns to memorize and to forget adaptively in the changing open world.
  • Figure 1: The mask selection strategy helps to reduce a significant number of unnecessary masks, thereby lessening interference to the system.
  • Figure 2: Overview of the proposed method. We effectively integrated instance segmentation module, visual encoder, and a continually-learned classifier to achieve accurate dynamic object identification, enabling robust localization and mapping in complex dynamic environments.
  • Figure 2: Additional comparison results of mapping and tracking.
  • Figure 3: The optimization divergence of either camera pose, neural map, or motion status classifier will lead to high photometric and geometric errors.
  • ...and 9 more figures