Table of Contents
Fetching ...

Adaptive VIO: Deep Visual-Inertial Odometry with Online Continual Learning

Youqi Pan, Wugen Zhou, Yingdian Cao, Hongbin Zha

TL;DR

Adaptive VIO tackles monocular VI-O generalization by coupling two neural predictors—visual correspondence and IMU bias—with a differentiable, optimization-based back-end. A feedback loop from visual-inertial bundle adjustment provides self-supervised losses to refine the predictors, enabling online continual learning within a sliding window of $n$ frames. Empirical results on EuRoC and TUM-VI demonstrate adaptive improvements and competitive performance against state-of-the-art optimization-based VIO, while surpassing many learning-based approaches. The approach highlights a principled bridge between learning and classical SLAM, enabling robust, environment-aware VIO with online adaptation potential.

Abstract

Visual-inertial odometry (VIO) has demonstrated remarkable success due to its low-cost and complementary sensors. However, existing VIO methods lack the generalization ability to adjust to different environments and sensor attributes. In this paper, we propose Adaptive VIO, a new monocular visual-inertial odometry that combines online continual learning with traditional nonlinear optimization. Adaptive VIO comprises two networks to predict visual correspondence and IMU bias. Unlike end-to-end approaches that use networks to fuse the features from two modalities (camera and IMU) and predict poses directly, we combine neural networks with visual-inertial bundle adjustment in our VIO system. The optimized estimates will be fed back to the visual and IMU bias networks, refining the networks in a self-supervised manner. Such a learning-optimization-combined framework and feedback mechanism enable the system to perform online continual learning. Experiments demonstrate that our Adaptive VIO manifests adaptive capability on EuRoC and TUM-VI datasets. The overall performance exceeds the currently known learning-based VIO methods and is comparable to the state-of-the-art optimization-based methods.

Adaptive VIO: Deep Visual-Inertial Odometry with Online Continual Learning

TL;DR

Adaptive VIO tackles monocular VI-O generalization by coupling two neural predictors—visual correspondence and IMU bias—with a differentiable, optimization-based back-end. A feedback loop from visual-inertial bundle adjustment provides self-supervised losses to refine the predictors, enabling online continual learning within a sliding window of frames. Empirical results on EuRoC and TUM-VI demonstrate adaptive improvements and competitive performance against state-of-the-art optimization-based VIO, while surpassing many learning-based approaches. The approach highlights a principled bridge between learning and classical SLAM, enabling robust, environment-aware VIO with online adaptation potential.

Abstract

Visual-inertial odometry (VIO) has demonstrated remarkable success due to its low-cost and complementary sensors. However, existing VIO methods lack the generalization ability to adjust to different environments and sensor attributes. In this paper, we propose Adaptive VIO, a new monocular visual-inertial odometry that combines online continual learning with traditional nonlinear optimization. Adaptive VIO comprises two networks to predict visual correspondence and IMU bias. Unlike end-to-end approaches that use networks to fuse the features from two modalities (camera and IMU) and predict poses directly, we combine neural networks with visual-inertial bundle adjustment in our VIO system. The optimized estimates will be fed back to the visual and IMU bias networks, refining the networks in a self-supervised manner. Such a learning-optimization-combined framework and feedback mechanism enable the system to perform online continual learning. Experiments demonstrate that our Adaptive VIO manifests adaptive capability on EuRoC and TUM-VI datasets. The overall performance exceeds the currently known learning-based VIO methods and is comparable to the state-of-the-art optimization-based methods.
Paper Structure (14 sections, 18 equations, 4 figures, 3 tables)

This paper contains 14 sections, 18 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Frameworks of different VIO methods. Learning-based modules are colored in orange. Traditional computational modules are colored in green. (a) Classic optimization-based method. (b) End-to-end learning-based method. (c) Learning-optimization-combined method with online continual learning (ours).
  • Figure 2: The tracking pipeline of our VIO. The green modules (B, E, F) denote manually designed algorithms. The yellow (A) and orange (C, D) trapezoids represent modules implemented by neural networks, and orange modules can get online continual learning.
  • Figure 3: RMSE ATE changes in MH_01 (EuRoC) and room1 (TUM-VI) during online continual learning of visual and IMU.
  • Figure 4: Estimated trajectories comparison on EuRoC (two sub-figures on the left) and TUM-VI dataset (two sub-figures on the right).