Table of Contents
Fetching ...

Dynamic semantic VSLAM with known and unknown objects

Sanghyoup Gu, Ratnesh Kumar

TL;DR

This work addresses dynamic Visual SLAM in scenes containing unknown objects by presenting a feature-based pipeline built atop ORB-SLAM2. It fuses unsupervised segmentation (Fast-SAM) with a YOLOv8 detection head to label known objects while generating unlabeled segments for unknowns, and leverages optical-flow gradients to reveal motion boundaries. A Semantic-Detection-Optical-flow (SDO) classifier labels segments as static or dynamic, followed by a consistency-driven scene-flow refinement that iteratively improves pose estimation by excluding dynamic features. Results show superior performance in unknown-object scenarios and competitive results with known objects, demonstrating robustness in mixed-object environments and reducing reliance on predefined classes.

Abstract

Traditional Visual Simultaneous Localization and Mapping (VSLAM) systems assume a static environment, which makes them ineffective in highly dynamic settings. To overcome this, many approaches integrate semantic information from deep learning models to identify dynamic regions within images. However, these methods face a significant limitation as a supervised model cannot recognize objects not included in the training datasets. This paper introduces a novel feature-based Semantic VSLAM capable of detecting dynamic features in the presence of both known and unknown objects. By employing an unsupervised segmentation network, we achieve unlabeled segmentation, and next utilize an objector detector to identify any of the known classes among those. We then pair this with the computed high-gradient optical-flow information to next identify the static versus dynamic segmentations for both known and unknown object classes. A consistency check module is also introduced for further refinement and final classification into static versus dynamic features. Evaluations using public datasets demonstrate that our method offers superior performance than traditional VSLAM when unknown objects are present in the images while still matching the performance of the leading semantic VSLAM techniques when the images contain only the known objects

Dynamic semantic VSLAM with known and unknown objects

TL;DR

This work addresses dynamic Visual SLAM in scenes containing unknown objects by presenting a feature-based pipeline built atop ORB-SLAM2. It fuses unsupervised segmentation (Fast-SAM) with a YOLOv8 detection head to label known objects while generating unlabeled segments for unknowns, and leverages optical-flow gradients to reveal motion boundaries. A Semantic-Detection-Optical-flow (SDO) classifier labels segments as static or dynamic, followed by a consistency-driven scene-flow refinement that iteratively improves pose estimation by excluding dynamic features. Results show superior performance in unknown-object scenarios and competitive results with known objects, demonstrating robustness in mixed-object environments and reducing reliance on predefined classes.

Abstract

Traditional Visual Simultaneous Localization and Mapping (VSLAM) systems assume a static environment, which makes them ineffective in highly dynamic settings. To overcome this, many approaches integrate semantic information from deep learning models to identify dynamic regions within images. However, these methods face a significant limitation as a supervised model cannot recognize objects not included in the training datasets. This paper introduces a novel feature-based Semantic VSLAM capable of detecting dynamic features in the presence of both known and unknown objects. By employing an unsupervised segmentation network, we achieve unlabeled segmentation, and next utilize an objector detector to identify any of the known classes among those. We then pair this with the computed high-gradient optical-flow information to next identify the static versus dynamic segmentations for both known and unknown object classes. A consistency check module is also introduced for further refinement and final classification into static versus dynamic features. Evaluations using public datasets demonstrate that our method offers superior performance than traditional VSLAM when unknown objects are present in the images while still matching the performance of the leading semantic VSLAM techniques when the images contain only the known objects

Paper Structure

This paper contains 12 sections, 13 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Two stage approach
  • Figure 2: Framework and segmentation model
  • Figure 3: Optical flow gradient
  • Figure 4: Dynamic segmentations selection
  • Figure 5: Results of dynamic mask selection
  • ...and 3 more figures