Table of Contents
Fetching ...

A Visual-Inertial Motion Prior SLAM for Dynamic Environments

Weilong Sun, Yumin Zhang, Boren Wei

TL;DR

The paper tackles VI-SLAM in dynamic environments where moving landmarks distort localization and contaminate maps. It proposes IDY-VINS, a two‑stage approach that (1) preprocesses landmarks using an inertial motion prior and epipolar constraints to identify dynamic candidates and (2) performs a robust, self-adaptive sliding-window BA that downweights dynamic candidates and stabilizes optimization, followed by post-processing to produce a clean static map. Experiments on VIODE and EUROC show that IDY-VINS improves localization accuracy and maintains real-time performance compared with state‑of‑the‑art methods, especially in dynamic scenes. This work advances Life-Long VI-SLAM by delivering a reliable static feature map and highlighting avenues for semantic integration and loop-closure enhancements.

Abstract

The Visual-Inertial Simultaneous Localization and Mapping (VI-SLAM) algorithms which are mostly based on static assumption are widely used in fields such as robotics, UAVs, VR, and autonomous driving. To overcome the localization risks caused by dynamic landmarks in most VI-SLAM systems, a robust visual-inertial motion prior SLAM system, named IDY-VINS, is proposed in this paper which effectively handles dynamic landmarks using inertial motion prior for dynamic environments to varying degrees. Specifically, potential dynamic landmarks are preprocessed during the feature tracking phase by the probabilistic model of landmarks' minimum projection errors which are obtained from inertial motion prior and epipolar constraint. Subsequently, a robust and self-adaptive bundle adjustment residual is proposed considering the minimum projection error prior for dynamic candidate landmarks. This residual is integrated into a sliding window based nonlinear optimization process to estimate camera poses, IMU states and landmark positions while minimizing the impact of dynamic candidate landmarks that deviate from the motion prior. Finally, a clean point cloud map without `ghosting effect' is obtained that contains only static landmarks. Experimental results demonstrate that our proposed system outperforms state-of-the-art methods in terms of localization accuracy and time cost by robustly mitigating the influence of dynamic landmarks.

A Visual-Inertial Motion Prior SLAM for Dynamic Environments

TL;DR

The paper tackles VI-SLAM in dynamic environments where moving landmarks distort localization and contaminate maps. It proposes IDY-VINS, a two‑stage approach that (1) preprocesses landmarks using an inertial motion prior and epipolar constraints to identify dynamic candidates and (2) performs a robust, self-adaptive sliding-window BA that downweights dynamic candidates and stabilizes optimization, followed by post-processing to produce a clean static map. Experiments on VIODE and EUROC show that IDY-VINS improves localization accuracy and maintains real-time performance compared with state‑of‑the‑art methods, especially in dynamic scenes. This work advances Life-Long VI-SLAM by delivering a reliable static feature map and highlighting avenues for semantic integration and loop-closure enhancements.

Abstract

The Visual-Inertial Simultaneous Localization and Mapping (VI-SLAM) algorithms which are mostly based on static assumption are widely used in fields such as robotics, UAVs, VR, and autonomous driving. To overcome the localization risks caused by dynamic landmarks in most VI-SLAM systems, a robust visual-inertial motion prior SLAM system, named IDY-VINS, is proposed in this paper which effectively handles dynamic landmarks using inertial motion prior for dynamic environments to varying degrees. Specifically, potential dynamic landmarks are preprocessed during the feature tracking phase by the probabilistic model of landmarks' minimum projection errors which are obtained from inertial motion prior and epipolar constraint. Subsequently, a robust and self-adaptive bundle adjustment residual is proposed considering the minimum projection error prior for dynamic candidate landmarks. This residual is integrated into a sliding window based nonlinear optimization process to estimate camera poses, IMU states and landmark positions while minimizing the impact of dynamic candidate landmarks that deviate from the motion prior. Finally, a clean point cloud map without `ghosting effect' is obtained that contains only static landmarks. Experimental results demonstrate that our proposed system outperforms state-of-the-art methods in terms of localization accuracy and time cost by robustly mitigating the influence of dynamic landmarks.

Paper Structure

This paper contains 15 sections, 10 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Point cloud map:The upper part is a point cloud map containing both static and dynamic landmarks which include the dynamic landmarks that were excluded during the feature tracking phase with triangulation, as well as the dynamic landmarks that were confirmed after the nonlinear optimization process. And the lower part is a clean map containing only static landmarks.
  • Figure 2: System overview: Our system consists of two main components: (1) Sensor preprocessing, which gets landmarks' minimum projection errors from inertial motion prior and epipolar constraint to eliminates dynamic landmarks and mark dynamic candidate landmarks; (2) Sliding window based optimization with robust and self-adaptive bundle adjustment residual, which minimizes the impact of dynamic candidate landmarks that deviate from the motion prior. And the post-processing component, which removes confirmed dynamic landmarks from sliding window.
  • Figure 3: The epipolar constraint of dynamic landmarks between $kth$ frame and $(k+1)th$ frame.
  • Figure 4: Preprocess of dynamic landmarks: The blue $\color{blue}\times$ marks the rejected dynamic landmarks, the green $\color{green}\bullet$ marks the dynamic candidate landmarks, and the red $\color{red}\bullet$ marks the static landmarks. (a) and (b): The effect comparisons of our method and RANSAC method on the frame with large-scale and small-scale dynamic objects.
  • Figure 5: Sliding window contains static and dynamic candidate landmarks.
  • ...and 6 more figures