MambaIO: Global-Coordinate Inertial Odometry for Pedestrians via Multi-Scale Frequency-Decoupled Modeling
Shanshan Zhang, Liqin Wu, Wenying Cao, Siyue Wang, Tianshui Wen, Qi Zhang, Xuemin Hong, Ao Peng, Lingxiang Zheng, Yu Yang
TL;DR
This work investigates whether the global coordinate frame provides superior representations for pedestrian inertial odometry and finds it generally advantageous over the body frame due to non-rigid IMU-to-center-of-mass mappings. Building on this, it introduces MambaIO, a frequency-decomposed IO framework that uses a differentiable Laplacian pyramid to split IMU signals into low- and high-frequency components and processes them through a Mamba (state-space) module and a multi-path convolution module, respectively, before fusing them for pose regression. The approach achieves state-of-the-art localization accuracy across five public pedestrian IO datasets, with substantial improvements in both Absolute Trajectory Error ($ATE$) and Relative Trajectory Error ($RTE$) compared to strong baselines. The results demonstrate the value of multi-scale, frequency-aware learning for inertial-only localization and point to future work on refining high-frequency separation to further reduce interference noise.
Abstract
Inertial Odometry (IO) enables real-time localization using only acceleration and angular velocity measurements from an Inertial Measurement Unit (IMU), making it a promising solution for localization in consumer-grade applications. Traditionally, researchers have routinely transformed IMU measurements into the global frame to obtain smoother motion representations. However, recent studies in drone scenarios have demonstrated that the body frame can significantly improve localization accuracy, prompting a re-evaluation of the suitability of the global frame for pedestrian IO. To address this issue, this paper systematically evaluates the effectiveness of the global frame in pedestrian IO through theoretical analysis, qualitative inspection, and quantitative experiments. Building upon these findings, we further propose MambaIO, which decomposes IMU measurements into high-frequency and low-frequency components using a Laplacian pyramid. The low-frequency component is processed by a Mamba architecture to extract implicit contextual motion cues, while the high-frequency component is handled by a convolutional structure to capture fine-grained local motion details. Experiments on multiple public datasets show that MambaIO substantially reduces localization error and achieves state-of-the-art (SOTA) performance. To the best of our knowledge, this is the first application of the Mamba architecture to the IO task.
