Table of Contents
Fetching ...

A Plug-and-Play Learning-based IMU Bias Factor for Robust Visual-Inertial Odometry

Yang Yi, Kunqing Wang, Jinpu Zhang, Zhen Tan, Xiangke Wang, Hui Shen, Dewen Hu

TL;DR

The paper tackles the instability of IMU bias estimation in visual-inertial odometry when visual information is unreliable by introducing IPNet, a non-recursive neural network that infers a stable IMU bias prior directly from raw IMU data. This prior is incorporated as a plug-and-play factor in the VIO backend, reducing bias-induced drift without propagating history-dependent errors. To train IPNet in the absence of ground-truth bias, the authors propose an iterative method to compute per-sequence average bias labels from pose ground truth, which are then used as supervision. Evaluations across public and in-house datasets show significant improvements in localization accuracy and robustness, with IPNet achieving real-time capable inference and reasonable generalization across platforms.

Abstract

Accurate and reliable estimation of biases of low-cost Inertial Measurement Units (IMU) is a key factor to maintain the resilience of Visual-Inertial Odometry (VIO), particularly when visual tracking fails in challenging areas. In such cases, bias estimates from the VIO can deviate significantly from the real values because of the insufficient or erroneous vision features, compromising both localization accuracy and system stability. To address this challenge, we propose a novel plug-and-play module featuring the Inertial Prior Network (IPNet), which infers an IMU bias prior by implicitly capturing the motion characteristics of specific platforms. The core idea is inspired intuitively by the observation that different platforms exhibit distinctive motion patterns, while the integration of low-cost IMU measurements suffers from unbounded error that quickly accumulates over time. Therefore, these specific motion patterns can be exploited to infer the underlying IMU bias. In this work, we first directly infer the biases prior only using the raw IMU data using a sliding window approach, eliminating the dependency on recursive bias estimation combining visual features, thus effectively preventing error propagation in challenging areas. Moreover, to compensate for the lack of ground-truth bias in most visual-inertial datasets, we further introduce an iterative method to compute the mean per-sequence IMU bias for network training and release it to benefit society. The framework is trained and evaluated separately on two public datasets and a self-collected dataset. Extensive experiments show that our method significantly improves localization precision and robustness.

A Plug-and-Play Learning-based IMU Bias Factor for Robust Visual-Inertial Odometry

TL;DR

The paper tackles the instability of IMU bias estimation in visual-inertial odometry when visual information is unreliable by introducing IPNet, a non-recursive neural network that infers a stable IMU bias prior directly from raw IMU data. This prior is incorporated as a plug-and-play factor in the VIO backend, reducing bias-induced drift without propagating history-dependent errors. To train IPNet in the absence of ground-truth bias, the authors propose an iterative method to compute per-sequence average bias labels from pose ground truth, which are then used as supervision. Evaluations across public and in-house datasets show significant improvements in localization accuracy and robustness, with IPNet achieving real-time capable inference and reasonable generalization across platforms.

Abstract

Accurate and reliable estimation of biases of low-cost Inertial Measurement Units (IMU) is a key factor to maintain the resilience of Visual-Inertial Odometry (VIO), particularly when visual tracking fails in challenging areas. In such cases, bias estimates from the VIO can deviate significantly from the real values because of the insufficient or erroneous vision features, compromising both localization accuracy and system stability. To address this challenge, we propose a novel plug-and-play module featuring the Inertial Prior Network (IPNet), which infers an IMU bias prior by implicitly capturing the motion characteristics of specific platforms. The core idea is inspired intuitively by the observation that different platforms exhibit distinctive motion patterns, while the integration of low-cost IMU measurements suffers from unbounded error that quickly accumulates over time. Therefore, these specific motion patterns can be exploited to infer the underlying IMU bias. In this work, we first directly infer the biases prior only using the raw IMU data using a sliding window approach, eliminating the dependency on recursive bias estimation combining visual features, thus effectively preventing error propagation in challenging areas. Moreover, to compensate for the lack of ground-truth bias in most visual-inertial datasets, we further introduce an iterative method to compute the mean per-sequence IMU bias for network training and release it to benefit society. The framework is trained and evaluated separately on two public datasets and a self-collected dataset. Extensive experiments show that our method significantly improves localization precision and robustness.

Paper Structure

This paper contains 28 sections, 7 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Our method ensures that the estimated IMU bias is more physically consistent, particularly when visual errors are significant, thereby mitigating the degradation of localization precision and system robustness.
  • Figure 2: The overview of the proposed system. Given image and IMU data, the framework outputs real-time pose information. At first, the IMU data is passed through the IPNet to estimate the current bias prior. Subsequently, the data stream is divided into two parts: one part inputs the bias prior along with the raw measurements into the pre-integration module, and performs optimization in the backend using IMU factors; the other part incorporates it as a prior factor into the factor graph, imposing reasonable prior constraints on the bias optimization process to ensure the robustness of the bias estimation.
  • Figure 3: The architecture of IPNet consists of encoder block, sequence modeling block and decoder block. The encoder block is mainly composed of four residual blocks and max-pooling layers, used to extract features and reduce the data dimension. The sequence modeling block is designed to further capture temporal dependencies, and the decoder block is responsible for performing regression to predict the bias of both acceleration and angular velocity.
  • Figure 4: IMU Bias Calculation Accuracy Experiment. The horizontal axis represents the mean of the ground truth bias, and the vertical axis represents the bias label calculated by our method. Ideally, they should be equal and distributed along a straight line with a slope of 1 and passing through the origin. The figure displays the bias distribution and fitting of different datasets. The gray area represents the confidence interval, used to characterize the 95% confidence interval of the fitting curve, while the yellow area represents the prediction interval, used to characterize the estimated range of future observation values under the 95% confidence level.
  • Figure 5: The image shows schematic diagrams of different scenes in challenging environments, which correspond to the following test sequences: (a): the Seq04 sequence of the In-house dataset, (b): the Seq17 sequence of the In-house dataset, (c): the V2_02 sequence of the EuRoC dataset, and (d): the room_5 sequence of the TumVi dataset. Due to poor visual features in these scenes, the trajectories diverge, affecting the robustness of the system.
  • ...and 5 more figures