Table of Contents
Fetching ...

IONext: Unlocking the Next Era of Inertial Odometry

Shanshan Zhang, Qi Zhang, Siyue Wang, Tianshui Wen, Liqin Wu, Ziheng Zhou, Xuemin Hong, Ao Peng, Lingxiang Zheng, Yu Yang

TL;DR

IONext tackles drift and generalization in inertial odometry by marrying CNN inductive bias with Transformer-inspired adaptability through the Adaptive Dynamic Encoder (ADE). ADE comprises ADM and AGU to jointly capture contextual motion and fine-grained local variations, enabling input-adaptive, multi-scale feature fusion. The work introduces the Absolute Length Error (ALE) metric with length-based normalization and demonstrates state-of-the-art performance across six public datasets, notably reducing errors on RNIN relative to strong baselines. This hybrid CNN-Transformer-inspired backbone offers robust, efficient IO suitable for infrastructure-free localization in diverse environments.

Abstract

Researchers have increasingly adopted Transformer-based models for inertial odometry. While Transformers excel at modeling long-range dependencies, their limited sensitivity to local, fine-grained motion variations and lack of inherent inductive biases often hinder localization accuracy and generalization. Recent studies have shown that incorporating large-kernel convolutions and Transformer-inspired architectural designs into CNN can effectively expand the receptive field, thereby improving global motion perception. Motivated by these insights, we propose a novel CNN-based module called the Dual-wing Adaptive Dynamic Mixer (DADM), which adaptively captures both global motion patterns and local, fine-grained motion features from dynamic inputs. This module dynamically generates selective weights based on the input, enabling efficient multi-scale feature aggregation. To further improve temporal modeling, we introduce the Spatio-Temporal Gating Unit (STGU), which selectively extracts representative and task-relevant motion features in the temporal domain. This unit addresses the limitations of temporal modeling observed in existing CNN approaches. Built upon DADM and STGU, we present a new CNN-based inertial odometry backbone, named Next Era of Inertial Odometry (IONext). Extensive experiments on six public datasets demonstrate that IONext consistently outperforms state-of-the-art (SOTA) Transformer- and CNN-based methods. For instance, on the RNIN dataset, IONext reduces the average ATE by 10% and the average RTE by 12% compared to the representative model iMOT.

IONext: Unlocking the Next Era of Inertial Odometry

TL;DR

IONext tackles drift and generalization in inertial odometry by marrying CNN inductive bias with Transformer-inspired adaptability through the Adaptive Dynamic Encoder (ADE). ADE comprises ADM and AGU to jointly capture contextual motion and fine-grained local variations, enabling input-adaptive, multi-scale feature fusion. The work introduces the Absolute Length Error (ALE) metric with length-based normalization and demonstrates state-of-the-art performance across six public datasets, notably reducing errors on RNIN relative to strong baselines. This hybrid CNN-Transformer-inspired backbone offers robust, efficient IO suitable for infrastructure-free localization in diverse environments.

Abstract

Researchers have increasingly adopted Transformer-based models for inertial odometry. While Transformers excel at modeling long-range dependencies, their limited sensitivity to local, fine-grained motion variations and lack of inherent inductive biases often hinder localization accuracy and generalization. Recent studies have shown that incorporating large-kernel convolutions and Transformer-inspired architectural designs into CNN can effectively expand the receptive field, thereby improving global motion perception. Motivated by these insights, we propose a novel CNN-based module called the Dual-wing Adaptive Dynamic Mixer (DADM), which adaptively captures both global motion patterns and local, fine-grained motion features from dynamic inputs. This module dynamically generates selective weights based on the input, enabling efficient multi-scale feature aggregation. To further improve temporal modeling, we introduce the Spatio-Temporal Gating Unit (STGU), which selectively extracts representative and task-relevant motion features in the temporal domain. This unit addresses the limitations of temporal modeling observed in existing CNN approaches. Built upon DADM and STGU, we present a new CNN-based inertial odometry backbone, named Next Era of Inertial Odometry (IONext). Extensive experiments on six public datasets demonstrate that IONext consistently outperforms state-of-the-art (SOTA) Transformer- and CNN-based methods. For instance, on the RNIN dataset, IONext reduces the average ATE by 10% and the average RTE by 12% compared to the representative model iMOT.

Paper Structure

This paper contains 13 sections, 13 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Comparison of $\overline{ATE}$, $\overline{RTE}$, and $\overline{ALE}$ on the RNIN dataset. IONext achieves lower errors than baselines.
  • Figure 2: The overall architecture of the proposed IONext consists of the ADE, which comprises the ADM and the AGU.
  • Figure 3: Performance evaluation on the RNIN dataset. (a)–(b): ATE/RTE CDF curves of IONext vs. baselines. (c)–(d): Effects of adding modules to IONext and R-ResNet. Curves closer to the top-left indicate better performance.
  • Figure 4: Visualization of sample trajectories across six datasets, comparing IONext with two baseline models (R-ResNet and iMOT).
  • Figure 5: (a), (b) and (c) are radar plots of $\overline{RTE}$, $\overline{ALE}$, and $\overline{ATE}$, respectively, for R-ResNet, IONext (w/o AGU), and the Full IONext on six benchmark datasets. Smaller polygon areas indicate lower errors.