RD-VIO: Robust Visual-Inertial Odometry for Mobile Augmented Reality in Dynamic Environments

Jinyu Li; Xiaokun Pan; Gan Huang; Ziyang Zhang; Nan Wang; Hujun Bao; Guofeng Zhang

RD-VIO: Robust Visual-Inertial Odometry for Mobile Augmented Reality in Dynamic Environments

Jinyu Li, Xiaokun Pan, Gan Huang, Ziyang Zhang, Nan Wang, Hujun Bao, Guofeng Zhang

TL;DR

RD-VIO tackles two core challenges in mobile visual-inertial odometry: dynamic scenes with moving objects and pure-rotation motions that degrade depth estimation. The approach combines a two-stage IMU-guided outlier rejection (IMU-PARSAC) with a sliding-window VIO that adapts to degenerate motion using a subframe structure and deferred triangulation. Key contributions include the IMU-informed consensus for robust 3D-2D and 2D-2D matching, and a subframe BA framework that preserves stability during low-translation periods, enabling real-time performance on mobile devices. Extensive evaluation on EuRoC and ADVIO, plus online AR comparisons and a mobile AR demo, demonstrates improved robustness and competitive accuracy in dynamic and degenerate scenarios, with practical impact for real-time mobile AR applications.

Abstract

It is typically challenging for visual or visual-inertial odometry systems to handle the problems of dynamic scenes and pure rotation. In this work, we design a novel visual-inertial odometry (VIO) system called RD-VIO to handle both of these two problems. Firstly, we propose an IMU-PARSAC algorithm which can robustly detect and match keypoints in a two-stage process. In the first state, landmarks are matched with new keypoints using visual and IMU measurements. We collect statistical information from the matching and then guide the intra-keypoint matching in the second stage. Secondly, to handle the problem of pure rotation, we detect the motion type and adapt the deferred-triangulation technique during the data-association process. We make the pure-rotational frames into the special subframes. When solving the visual-inertial bundle adjustment, they provide additional constraints to the pure-rotational motion. We evaluate the proposed VIO system on public datasets and online comparison. Experiments show the proposed RD-VIO has obvious advantages over other methods in dynamic environments. The source code is available at: \href{https://github.com/openxrlab/xrslam}{{\fontfamily{pcr}\selectfont https://github.com/openxrlab/xrslam}}.

RD-VIO: Robust Visual-Inertial Odometry for Mobile Augmented Reality in Dynamic Environments

TL;DR

Abstract

Paper Structure (27 sections, 14 equations, 12 figures, 4 tables)

This paper contains 27 sections, 14 equations, 12 figures, 4 tables.

Introduction
Related Works
VIO and SLAM
SLAM in Dynamic Environments
SLAM under Degenerated Conditions
Approach
Sliding-Window VIO
Sliding-Window Optimization
Initialization
Keypoint Tracking
Outliers Detection and Removal
3D-2D Matching Stage
2D-2D Matching Stage
Pure-Rotation Detection and Delayed Triangulation
Sliding Window with Subframes
...and 12 more sections

Figures (12)

Figure 1: The proposed RD-VIO can robustly work in dynamic scenes with pure rotation motions, and outperforms some other SOTA VIO/VI-SLAM systems such as VINS-Mobile.
Figure 2: The pipeline of RD-VIO
Figure 3: Moving outlier detection and removal strategy: In the mandatory 3D-2D stages, the current frame obtains initial matches of 2D observations and 3D points based on optical flow tracking with the last frame. After the IMU-PARSAC algorithm, most outliers are filtered out. In the optional 2D-2D stage, the current frame and the key frames in the sliding window are matched frame by frame using the original PARSAC algorithm. The remaining dynamic outliers are removed through this multi-view cross-validation approach.
Figure 4: Geometry illustration of our angle based pure-rotation detection. The maximum $\theta$ is realized when two rays-of-observation and the translation vector $t$ forms an isosceles triangle.
Figure 5: Example for point cloud from tracking when camera is stopped. Blue points are DT landmarks. They are casted into points with a fake 1m depth for visualization. And we can see normal landmarks (in red) are scarse in Baseline-VIO, because their depths are diverging. With DT, more keypoints can be tracked. SF-VIO on the otherhand, can keep the depths stable.
...and 7 more figures

RD-VIO: Robust Visual-Inertial Odometry for Mobile Augmented Reality in Dynamic Environments

TL;DR

Abstract

RD-VIO: Robust Visual-Inertial Odometry for Mobile Augmented Reality in Dynamic Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (12)