Table of Contents
Fetching ...

MambaIO: Global-Coordinate Inertial Odometry for Pedestrians via Multi-Scale Frequency-Decoupled Modeling

Shanshan Zhang, Liqin Wu, Wenying Cao, Siyue Wang, Tianshui Wen, Qi Zhang, Xuemin Hong, Ao Peng, Lingxiang Zheng, Yu Yang

TL;DR

This work investigates whether the global coordinate frame provides superior representations for pedestrian inertial odometry and finds it generally advantageous over the body frame due to non-rigid IMU-to-center-of-mass mappings. Building on this, it introduces MambaIO, a frequency-decomposed IO framework that uses a differentiable Laplacian pyramid to split IMU signals into low- and high-frequency components and processes them through a Mamba (state-space) module and a multi-path convolution module, respectively, before fusing them for pose regression. The approach achieves state-of-the-art localization accuracy across five public pedestrian IO datasets, with substantial improvements in both Absolute Trajectory Error ($ATE$) and Relative Trajectory Error ($RTE$) compared to strong baselines. The results demonstrate the value of multi-scale, frequency-aware learning for inertial-only localization and point to future work on refining high-frequency separation to further reduce interference noise.

Abstract

Inertial Odometry (IO) enables real-time localization using only acceleration and angular velocity measurements from an Inertial Measurement Unit (IMU), making it a promising solution for localization in consumer-grade applications. Traditionally, researchers have routinely transformed IMU measurements into the global frame to obtain smoother motion representations. However, recent studies in drone scenarios have demonstrated that the body frame can significantly improve localization accuracy, prompting a re-evaluation of the suitability of the global frame for pedestrian IO. To address this issue, this paper systematically evaluates the effectiveness of the global frame in pedestrian IO through theoretical analysis, qualitative inspection, and quantitative experiments. Building upon these findings, we further propose MambaIO, which decomposes IMU measurements into high-frequency and low-frequency components using a Laplacian pyramid. The low-frequency component is processed by a Mamba architecture to extract implicit contextual motion cues, while the high-frequency component is handled by a convolutional structure to capture fine-grained local motion details. Experiments on multiple public datasets show that MambaIO substantially reduces localization error and achieves state-of-the-art (SOTA) performance. To the best of our knowledge, this is the first application of the Mamba architecture to the IO task.

MambaIO: Global-Coordinate Inertial Odometry for Pedestrians via Multi-Scale Frequency-Decoupled Modeling

TL;DR

This work investigates whether the global coordinate frame provides superior representations for pedestrian inertial odometry and finds it generally advantageous over the body frame due to non-rigid IMU-to-center-of-mass mappings. Building on this, it introduces MambaIO, a frequency-decomposed IO framework that uses a differentiable Laplacian pyramid to split IMU signals into low- and high-frequency components and processes them through a Mamba (state-space) module and a multi-path convolution module, respectively, before fusing them for pose regression. The approach achieves state-of-the-art localization accuracy across five public pedestrian IO datasets, with substantial improvements in both Absolute Trajectory Error () and Relative Trajectory Error () compared to strong baselines. The results demonstrate the value of multi-scale, frequency-aware learning for inertial-only localization and point to future work on refining high-frequency separation to further reduce interference noise.

Abstract

Inertial Odometry (IO) enables real-time localization using only acceleration and angular velocity measurements from an Inertial Measurement Unit (IMU), making it a promising solution for localization in consumer-grade applications. Traditionally, researchers have routinely transformed IMU measurements into the global frame to obtain smoother motion representations. However, recent studies in drone scenarios have demonstrated that the body frame can significantly improve localization accuracy, prompting a re-evaluation of the suitability of the global frame for pedestrian IO. To address this issue, this paper systematically evaluates the effectiveness of the global frame in pedestrian IO through theoretical analysis, qualitative inspection, and quantitative experiments. Building upon these findings, we further propose MambaIO, which decomposes IMU measurements into high-frequency and low-frequency components using a Laplacian pyramid. The low-frequency component is processed by a Mamba architecture to extract implicit contextual motion cues, while the high-frequency component is handled by a convolutional structure to capture fine-grained local motion details. Experiments on multiple public datasets show that MambaIO substantially reduces localization error and achieves state-of-the-art (SOTA) performance. To the best of our knowledge, this is the first application of the Mamba architecture to the IO task.

Paper Structure

This paper contains 21 sections, 13 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Performance comparison of algorithms on the RoNIN dataset in terms of ATE and RTE. Points closer to the lower-left corner indicate lower errors and thus higher localization accuracy.
  • Figure 2: Schematic diagram of the coordinate system transformation relationship between the IMU and the carrier's center of mass in UAV and pedestrian motion scenarios.
  • Figure 3: Comparison of IO Learning Processes in body frame and global frame.
  • Figure 4: t-SNE visualization of features from the RIDI dataset, demonstrating that the global frame representation yields better separability of motion patterns.
  • Figure 5: Trajectory predictions by RoNIN ResNet on the RoNIN dataset under Body (blue) and Global (orange) coordinate representations. The Global-frame trajectory aligns more closely with the ground truth (green), indicating improved localization accuracy.
  • ...and 3 more figures