Panoptic-SLAM: Visual SLAM in Dynamic Environments using Panoptic Segmentation
Gabriel Fischer Abati, João Carlos Virgolino Soares, Vivian Suzano Medeiros, Marco Antonio Meggiolaro, Claudio Semini
TL;DR
Panoptic-SLAM integrates panoptic segmentation with ORB-SLAM3 to achieve robust visual SLAM in dynamic environments, including scenes with unknown moving objects. The system runs four parallel threads and introduces a four-stage dynamic-keypoint-filtering pipeline that uses Things, Stuff, and Unknown masks to classify keypoints as static or dynamic, enabling reliable localization and mapping. Across TUM RGB-D, Bonn RGB-D, and real-world indoor experiments, Panoptic-SLAM demonstrates competitive or superior accuracy relative to state-of-the-art methods like DynaSLAM and PVO, while highlighting the trade-off of non-real-time panoptic inference. The work also identifies limitations related to segmentation speed and illumination changes, and outlines future directions toward large-object handling and semantic map adaptation.
Abstract
The majority of visual SLAM systems are not robust in dynamic scenarios. The ones that deal with dynamic objects in the scenes usually rely on deep-learning-based methods to detect and filter these objects. However, these methods cannot deal with unknown moving objects. This work presents Panoptic-SLAM, an open-source visual SLAM system robust to dynamic environments, even in the presence of unknown objects. It uses panoptic segmentation to filter dynamic objects from the scene during the state estimation process. Panoptic-SLAM is based on ORB-SLAM3, a state-of-the-art SLAM system for static environments. The implementation was tested using real-world datasets and compared with several state-of-the-art systems from the literature, including DynaSLAM, DS-SLAM, SaD-SLAM, PVO and FusingPanoptic. For example, Panoptic-SLAM is on average four times more accurate than PVO, the most recent panoptic-based approach for visual SLAM. Also, experiments were performed using a quadruped robot with an RGB-D camera to test the applicability of our method in real-world scenarios. The tests were validated by a ground-truth created with a motion capture system.
