Table of Contents
Fetching ...

XR-VIO: High-precision Visual Inertial Odometry with Fast Initialization for XR Applications

Shangjin Zhai, Nan Wang, Xiaomeng Wang, Danpeng Chen, Weijian Xie, Hujun Bao, Guofeng Zhang

TL;DR

The paper tackles the challenge of robust, fast visual inertial odometry for XR by improving initialization and feature matching. It introduces XR-VIO, featuring a gyro-informed VG-SfM initialization pipeline (VG-SfM, VA-Align, VI-BA) that can initialize from as few as four frames, and a hybrid feature matching strategy that fuses optical flow with descriptor-based matching for stable, long tracks. The approach achieves state-of-the-art accuracy on multiple public benchmarks (e.g., EuRoC, ZJU-Sensetime) and runs in real time on mobile devices, enabling practical AR/VR deployment. Experimental results, including ablations and mobile AR demos, demonstrate increased initialization success, reduced drift, and robust performance across motion types, while acknowledging limitations in extreme or dynamic environments and suggesting future learning-based extensions.

Abstract

This paper presents a novel approach to Visual Inertial Odometry (VIO), focusing on the initialization and feature matching modules. Existing methods for initialization often suffer from either poor stability in visual Structure from Motion (SfM) or fragility in solving a huge number of parameters simultaneously. To address these challenges, we propose a new pipeline for visual inertial initialization that robustly handles various complex scenarios. By tightly coupling gyroscope measurements, we enhance the robustness and accuracy of visual SfM. Our method demonstrates stable performance even with only four image frames, yielding competitive results. In terms of feature matching, we introduce a hybrid method that combines optical flow and descriptor-based matching. By leveraging the robustness of continuous optical flow tracking and the accuracy of descriptor matching, our approach achieves efficient, accurate, and robust tracking results. Through evaluation on multiple benchmarks, our method demonstrates state-of-the-art performance in terms of accuracy and success rate. Additionally, a video demonstration on mobile devices showcases the practical applicability of our approach in the field of Augmented Reality/Virtual Reality (AR/VR).

XR-VIO: High-precision Visual Inertial Odometry with Fast Initialization for XR Applications

TL;DR

The paper tackles the challenge of robust, fast visual inertial odometry for XR by improving initialization and feature matching. It introduces XR-VIO, featuring a gyro-informed VG-SfM initialization pipeline (VG-SfM, VA-Align, VI-BA) that can initialize from as few as four frames, and a hybrid feature matching strategy that fuses optical flow with descriptor-based matching for stable, long tracks. The approach achieves state-of-the-art accuracy on multiple public benchmarks (e.g., EuRoC, ZJU-Sensetime) and runs in real time on mobile devices, enabling practical AR/VR deployment. Experimental results, including ablations and mobile AR demos, demonstrate increased initialization success, reduced drift, and robust performance across motion types, while acknowledging limitations in extreme or dynamic environments and suggesting future learning-based extensions.

Abstract

This paper presents a novel approach to Visual Inertial Odometry (VIO), focusing on the initialization and feature matching modules. Existing methods for initialization often suffer from either poor stability in visual Structure from Motion (SfM) or fragility in solving a huge number of parameters simultaneously. To address these challenges, we propose a new pipeline for visual inertial initialization that robustly handles various complex scenarios. By tightly coupling gyroscope measurements, we enhance the robustness and accuracy of visual SfM. Our method demonstrates stable performance even with only four image frames, yielding competitive results. In terms of feature matching, we introduce a hybrid method that combines optical flow and descriptor-based matching. By leveraging the robustness of continuous optical flow tracking and the accuracy of descriptor matching, our approach achieves efficient, accurate, and robust tracking results. Through evaluation on multiple benchmarks, our method demonstrates state-of-the-art performance in terms of accuracy and success rate. Additionally, a video demonstration on mobile devices showcases the practical applicability of our approach in the field of Augmented Reality/Virtual Reality (AR/VR).

Paper Structure

This paper contains 23 sections, 16 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: Pipeline of Visual Gyroscope tightly coupled SFM (VG-SFM)
  • Figure 2: Factor Graph of VG-BA
  • Figure 3: Hybrid Feature Matching Approach
  • Figure 4: Visualization of scale error on V2-01 (EuRoC). Fragments of poses are color-coded according to the magnitude of scale error for each initialization window in the dataset. Darker colors represent greater error, lighter colors indicate lower error, and black denotes failed initializations.
  • Figure 5: Cumulative distribution of initialization with different keyframes: 4KF and 5KF. Scale error, ATE and gravity RMSE are shown in 3 columns.
  • ...and 3 more figures