Table of Contents
Fetching ...

Dual-Agent Reinforcement Learning for Adaptive and Cost-Aware Visual-Inertial Odometry

Feiyang Pan, Shenghe Zheng, Chunyan Yin, Guangbin Dou

TL;DR

This work tackles the accuracy–efficiency trade-off in Visual-Inertial Odometry by introducing a dual-agent reinforcement learning framework that reduces reliance on the expensive Visual-Inertial Bundle Adjustment. A Select Agent gates the VO frontend based on IMU data, while a Fusion Agent adaptively fuses IMU propagation with sparse VO updates, supported by an IMU Bias Estimator. Empirical results on EuRoC and TUM-VI show strong accuracy with substantially higher throughput and lower memory usage compared with GPU-based VO/VIO systems, and competitive performance relative to classical VIO backends. The approach enables a resource-aware VIO front-end suitable for edge devices, with robust performance under visual degradations and promising directions for future hardware-integrated deployment.

Abstract

Visual-Inertial Odometry (VIO) is a critical component for robust ego-motion estimation, enabling foundational capabilities such as autonomous navigation in robotics and real-time 6-DoF tracking for augmented reality. Existing methods face a well-known trade-off: filter-based approaches are efficient but prone to drift, while optimization-based methods, though accurate, rely on computationally prohibitive Visual-Inertial Bundle Adjustment (VIBA) that is difficult to run on resource-constrained platforms. Rather than removing VIBA altogether, we aim to reduce how often and how heavily it must be invoked. To this end, we cast two key design choices in modern VIO, when to run the visual frontend and how strongly to trust its output, as sequential decision problems, and solve them with lightweight reinforcement learning (RL) agents. Our framework introduces a lightweight, dual-pronged RL policy that serves as our core contribution: (1) a Select Agent intelligently gates the entire VO pipeline based only on high-frequency IMU data; and (2) a composite Fusion Agent that first estimates a robust velocity state via a supervised network, before an RL policy adaptively fuses the full (p, v, q) state. Experiments on the EuRoC MAV and TUM-VI datasets show that, in our unified evaluation, the proposed method achieves a more favorable accuracy-efficiency-memory trade-off than prior GPU-based VO/VIO systems: it attains the best average ATE while running up to 1.77 times faster and using less GPU memory. Compared to classical optimization-based VIO systems, our approach maintains competitive trajectory accuracy while substantially reducing computational load.

Dual-Agent Reinforcement Learning for Adaptive and Cost-Aware Visual-Inertial Odometry

TL;DR

This work tackles the accuracy–efficiency trade-off in Visual-Inertial Odometry by introducing a dual-agent reinforcement learning framework that reduces reliance on the expensive Visual-Inertial Bundle Adjustment. A Select Agent gates the VO frontend based on IMU data, while a Fusion Agent adaptively fuses IMU propagation with sparse VO updates, supported by an IMU Bias Estimator. Empirical results on EuRoC and TUM-VI show strong accuracy with substantially higher throughput and lower memory usage compared with GPU-based VO/VIO systems, and competitive performance relative to classical VIO backends. The approach enables a resource-aware VIO front-end suitable for edge devices, with robust performance under visual degradations and promising directions for future hardware-integrated deployment.

Abstract

Visual-Inertial Odometry (VIO) is a critical component for robust ego-motion estimation, enabling foundational capabilities such as autonomous navigation in robotics and real-time 6-DoF tracking for augmented reality. Existing methods face a well-known trade-off: filter-based approaches are efficient but prone to drift, while optimization-based methods, though accurate, rely on computationally prohibitive Visual-Inertial Bundle Adjustment (VIBA) that is difficult to run on resource-constrained platforms. Rather than removing VIBA altogether, we aim to reduce how often and how heavily it must be invoked. To this end, we cast two key design choices in modern VIO, when to run the visual frontend and how strongly to trust its output, as sequential decision problems, and solve them with lightweight reinforcement learning (RL) agents. Our framework introduces a lightweight, dual-pronged RL policy that serves as our core contribution: (1) a Select Agent intelligently gates the entire VO pipeline based only on high-frequency IMU data; and (2) a composite Fusion Agent that first estimates a robust velocity state via a supervised network, before an RL policy adaptively fuses the full (p, v, q) state. Experiments on the EuRoC MAV and TUM-VI datasets show that, in our unified evaluation, the proposed method achieves a more favorable accuracy-efficiency-memory trade-off than prior GPU-based VO/VIO systems: it attains the best average ATE while running up to 1.77 times faster and using less GPU memory. Compared to classical optimization-based VIO systems, our approach maintains competitive trajectory accuracy while substantially reducing computational load.

Paper Structure

This paper contains 25 sections, 19 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: The accuracy-efficiency trade-off in VIO. (a) The traditional tightly-coupled VIO framework, which relies on a monolithic and computationally expensive Visual-Inertial Bundle Adjustment (VIBA) block. (b) Our proposed decoupled RL-based framework. We mitigate the VIBA bottleneck by introducing two intelligent agents.
  • Figure 2: Overview of our proposed VIO pipeline. The system is composed of four decoupled modules: (1) IMU Preprocess, (2) Select Agent, (3) Visual Odometry, and (4) Fusion Agent. This framework leverages Reinforcement Learning to intelligently schedule and fuse sensor data, offering a highly computationally efficient alternative to traditional, tightly-coupled Visual-Inertial Bundle Adjustment.
  • Figure 3: Visual summary of initialization.
  • Figure 4: Ablation study for the IMU Bias Estimator. (a) Compares final ATE with different bias components enabled. (b) Compares ATE for different bias output strategies.
  • Figure 5: Training curve of the Select Agent using PPO.
  • ...and 4 more figures