Table of Contents
Fetching ...

Panoptic-SLAM: Visual SLAM in Dynamic Environments using Panoptic Segmentation

Gabriel Fischer Abati, João Carlos Virgolino Soares, Vivian Suzano Medeiros, Marco Antonio Meggiolaro, Claudio Semini

TL;DR

Panoptic-SLAM integrates panoptic segmentation with ORB-SLAM3 to achieve robust visual SLAM in dynamic environments, including scenes with unknown moving objects. The system runs four parallel threads and introduces a four-stage dynamic-keypoint-filtering pipeline that uses Things, Stuff, and Unknown masks to classify keypoints as static or dynamic, enabling reliable localization and mapping. Across TUM RGB-D, Bonn RGB-D, and real-world indoor experiments, Panoptic-SLAM demonstrates competitive or superior accuracy relative to state-of-the-art methods like DynaSLAM and PVO, while highlighting the trade-off of non-real-time panoptic inference. The work also identifies limitations related to segmentation speed and illumination changes, and outlines future directions toward large-object handling and semantic map adaptation.

Abstract

The majority of visual SLAM systems are not robust in dynamic scenarios. The ones that deal with dynamic objects in the scenes usually rely on deep-learning-based methods to detect and filter these objects. However, these methods cannot deal with unknown moving objects. This work presents Panoptic-SLAM, an open-source visual SLAM system robust to dynamic environments, even in the presence of unknown objects. It uses panoptic segmentation to filter dynamic objects from the scene during the state estimation process. Panoptic-SLAM is based on ORB-SLAM3, a state-of-the-art SLAM system for static environments. The implementation was tested using real-world datasets and compared with several state-of-the-art systems from the literature, including DynaSLAM, DS-SLAM, SaD-SLAM, PVO and FusingPanoptic. For example, Panoptic-SLAM is on average four times more accurate than PVO, the most recent panoptic-based approach for visual SLAM. Also, experiments were performed using a quadruped robot with an RGB-D camera to test the applicability of our method in real-world scenarios. The tests were validated by a ground-truth created with a motion capture system.

Panoptic-SLAM: Visual SLAM in Dynamic Environments using Panoptic Segmentation

TL;DR

Panoptic-SLAM integrates panoptic segmentation with ORB-SLAM3 to achieve robust visual SLAM in dynamic environments, including scenes with unknown moving objects. The system runs four parallel threads and introduces a four-stage dynamic-keypoint-filtering pipeline that uses Things, Stuff, and Unknown masks to classify keypoints as static or dynamic, enabling reliable localization and mapping. Across TUM RGB-D, Bonn RGB-D, and real-world indoor experiments, Panoptic-SLAM demonstrates competitive or superior accuracy relative to state-of-the-art methods like DynaSLAM and PVO, while highlighting the trade-off of non-real-time panoptic inference. The work also identifies limitations related to segmentation speed and illumination changes, and outlines future directions toward large-object handling and semantic map adaptation.

Abstract

The majority of visual SLAM systems are not robust in dynamic scenarios. The ones that deal with dynamic objects in the scenes usually rely on deep-learning-based methods to detect and filter these objects. However, these methods cannot deal with unknown moving objects. This work presents Panoptic-SLAM, an open-source visual SLAM system robust to dynamic environments, even in the presence of unknown objects. It uses panoptic segmentation to filter dynamic objects from the scene during the state estimation process. Panoptic-SLAM is based on ORB-SLAM3, a state-of-the-art SLAM system for static environments. The implementation was tested using real-world datasets and compared with several state-of-the-art systems from the literature, including DynaSLAM, DS-SLAM, SaD-SLAM, PVO and FusingPanoptic. For example, Panoptic-SLAM is on average four times more accurate than PVO, the most recent panoptic-based approach for visual SLAM. Also, experiments were performed using a quadruped robot with an RGB-D camera to test the applicability of our method in real-world scenarios. The tests were validated by a ground-truth created with a motion capture system.
Paper Structure (12 sections, 3 equations, 9 figures, 5 tables)

This paper contains 12 sections, 3 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Overview of the Panoptic-SLAM for filtering dynamic objects. In (a) initial feature detection, (b) shows the output of panoptic segmentation, (c) shows the masks of moving instances (person and box), and (d) shows the keypoints belonging to moving objects successfully filtered, even though there is no cardboard box in the list of trained classes
  • Figure 2: Framework of Panoptic-SLAM. The processes highlighted in yellow are the additional modules included in ORB-SLAM3 to allow dynamic keypoint filtering. The processes highlighted in blue describe the dynamic keypoint classification, which also run in the tracking thread. To improve computational efficiency, panoptic segmentation (in green) runs in a separate thread
  • Figure 3: Example of an unknown moving object being filtered in a sequence of the Bonn RGB-D dynamic dataset
  • Figure 4: Comparison between the ground-truth and the trajectory estimated by Panoptic-SLAM in two dynamic sequences of the TUM RGB-D dataset
  • Figure 5: Comparison between the ground-truth and the trajectory estimated by ORB-SLAM3 (left) and Panoptic-SLAM (right) in the Non-obstructing box sequence of the Bonn RGB-D dynamic dataset
  • ...and 4 more figures