Table of Contents
Fetching ...

MLP-SLAM: Multilayer Perceptron-Based Simultaneous Localization and Mapping

Taozhe Li, Wei Sun

TL;DR

This work addresses the challenge of SLAM performance degradation in dynamic outdoor environments by introducing an open-source MLP-based real-time stereo SLAM system that discriminates dynamic vs static feature points to preserve informative geometry. Built on ORB-SLAM2, the system adds a Depth Filter Module, a coarse pose estimation stage with object detection and tracking, an MLP-based discriminator using three error-based features, and a fine estimation stage that refines pose using static features. A new public dataset with over 50,000 labeled feature points enables direct evaluation of dynamic/static discrimination, and the method demonstrates superior average precision and faster speed on KITTI benchmarks compared with state-of-the-art dynamic SLAM methods. The authors also provide code and datasets on GitHub for reproducibility and broader usage.

Abstract

The Visual Simultaneous Localization and Mapping (V-SLAM) system has seen significant development in recent years, demonstrating high precision in environments with limited dynamic objects. However, their performance significantly deteriorates when deployed in settings with a higher presence of movable objects, such as environments with pedestrians, cars, and buses, which are common in outdoor scenes. To address this issue, we propose a Multilayer Perceptron (MLP)-based real-time stereo SLAM system that leverages complete geometry information to avoid information loss. Moreover, there is currently no publicly available dataset for directly evaluating the effectiveness of dynamic and static feature classification methods, and to bridge this gap, we have created a publicly available dataset containing over 50,000 feature points. Experimental results demonstrate that our MLP-based dynamic and static feature point discriminator has achieved superior performance compared to other methods on this dataset. Furthermore, the MLP-based real-time stereo SLAM system has shown the highest average precision and fastest speed on the outdoor KITTI tracking datasets compared to other dynamic SLAM systems.The open-source code and datasets are available at https://github.com/TaozheLi/MLP-SLAM.

MLP-SLAM: Multilayer Perceptron-Based Simultaneous Localization and Mapping

TL;DR

This work addresses the challenge of SLAM performance degradation in dynamic outdoor environments by introducing an open-source MLP-based real-time stereo SLAM system that discriminates dynamic vs static feature points to preserve informative geometry. Built on ORB-SLAM2, the system adds a Depth Filter Module, a coarse pose estimation stage with object detection and tracking, an MLP-based discriminator using three error-based features, and a fine estimation stage that refines pose using static features. A new public dataset with over 50,000 labeled feature points enables direct evaluation of dynamic/static discrimination, and the method demonstrates superior average precision and faster speed on KITTI benchmarks compared with state-of-the-art dynamic SLAM methods. The authors also provide code and datasets on GitHub for reproducibility and broader usage.

Abstract

The Visual Simultaneous Localization and Mapping (V-SLAM) system has seen significant development in recent years, demonstrating high precision in environments with limited dynamic objects. However, their performance significantly deteriorates when deployed in settings with a higher presence of movable objects, such as environments with pedestrians, cars, and buses, which are common in outdoor scenes. To address this issue, we propose a Multilayer Perceptron (MLP)-based real-time stereo SLAM system that leverages complete geometry information to avoid information loss. Moreover, there is currently no publicly available dataset for directly evaluating the effectiveness of dynamic and static feature classification methods, and to bridge this gap, we have created a publicly available dataset containing over 50,000 feature points. Experimental results demonstrate that our MLP-based dynamic and static feature point discriminator has achieved superior performance compared to other methods on this dataset. Furthermore, the MLP-based real-time stereo SLAM system has shown the highest average precision and fastest speed on the outdoor KITTI tracking datasets compared to other dynamic SLAM systems.The open-source code and datasets are available at https://github.com/TaozheLi/MLP-SLAM.

Paper Structure

This paper contains 22 sections, 7 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Qualitative comparison between our method and method in DynaSLAM. The static feature point is represented by a blue circle, while the dynamic feature point is represented by a yellow circle. The top image displays the results of our method, while the bottom image shows the results of the method in DynaSLAM. In the left image, the method in DynaSLAM categorizes feature points on the highway guardrail as dynamic, whereas our method correctly predicts them as static. In the right image, numerous static feature points associated with parked cars on the roadside are misclassified as dynamic by the method in DynaSLAM, whereas our method correctly identifies them as static.
  • Figure 2: Diagram of our proposed MLP based real-time stereo SLAM system. It was built on ORB-SLAM2. There are four main parts in total. The blue section involves pre-processing, feature extraction, object detection, and object tracking. The pre-processing and feature extraction parts are default setting of ORB-SLAM2.The green section is for coarse estimation, generating a rough pose estimation. The pink section involves an MLP model for classifying dynamic and static feature points within potential dynamic objects. The yellow section is for fine estimation, refining the camera pose generated in the coarse estimation stage by minimizing the re-projection error of all static feature points. These points include those outside the bounding box of potential dynamic objects, filtered by the Depth Filter Module, and classified as static by the MLP model.
  • Figure 3: Feature points flow among different module of our SLAM system. The yellow circle represented dynamic feature points set, and the blue one standed for static feature points set. Moreover, all square represented an module of our SLAM system, the one outline with dashed line indicated that it has no impact to feature point sets.
  • Figure 4: Training process of optimal MLP model. It is showing a consistent decrease in both training loss and validation losses indicated successful learning.Also, the Accuracy metric, which calculates the ratio between the number of correct classification cases and the number of total cases, is shown to be increasing.