Dual-Agent Reinforcement Learning for Adaptive and Cost-Aware Visual-Inertial Odometry
Feiyang Pan, Shenghe Zheng, Chunyan Yin, Guangbin Dou
TL;DR
This work tackles the accuracy–efficiency trade-off in Visual-Inertial Odometry by introducing a dual-agent reinforcement learning framework that reduces reliance on the expensive Visual-Inertial Bundle Adjustment. A Select Agent gates the VO frontend based on IMU data, while a Fusion Agent adaptively fuses IMU propagation with sparse VO updates, supported by an IMU Bias Estimator. Empirical results on EuRoC and TUM-VI show strong accuracy with substantially higher throughput and lower memory usage compared with GPU-based VO/VIO systems, and competitive performance relative to classical VIO backends. The approach enables a resource-aware VIO front-end suitable for edge devices, with robust performance under visual degradations and promising directions for future hardware-integrated deployment.
Abstract
Visual-Inertial Odometry (VIO) is a critical component for robust ego-motion estimation, enabling foundational capabilities such as autonomous navigation in robotics and real-time 6-DoF tracking for augmented reality. Existing methods face a well-known trade-off: filter-based approaches are efficient but prone to drift, while optimization-based methods, though accurate, rely on computationally prohibitive Visual-Inertial Bundle Adjustment (VIBA) that is difficult to run on resource-constrained platforms. Rather than removing VIBA altogether, we aim to reduce how often and how heavily it must be invoked. To this end, we cast two key design choices in modern VIO, when to run the visual frontend and how strongly to trust its output, as sequential decision problems, and solve them with lightweight reinforcement learning (RL) agents. Our framework introduces a lightweight, dual-pronged RL policy that serves as our core contribution: (1) a Select Agent intelligently gates the entire VO pipeline based only on high-frequency IMU data; and (2) a composite Fusion Agent that first estimates a robust velocity state via a supervised network, before an RL policy adaptively fuses the full (p, v, q) state. Experiments on the EuRoC MAV and TUM-VI datasets show that, in our unified evaluation, the proposed method achieves a more favorable accuracy-efficiency-memory trade-off than prior GPU-based VO/VIO systems: it attains the best average ATE while running up to 1.77 times faster and using less GPU memory. Compared to classical optimization-based VIO systems, our approach maintains competitive trajectory accuracy while substantially reducing computational load.
