Table of Contents
Fetching ...

iMoT: Inertial Motion Transformer for Inertial Navigation

Son Minh Nguyen, Linh Duy Tran, Duc Viet Le, Paul J. M Havinga

TL;DR

iMoT addresses inertial odometry by fusing acceleration and angular velocity through a Transformer-based encoder–decoder. It introduces PSD to extract informative temporal components, APE to align cross-modal positions, ASC to preserve cross-channel details, and a decoder that uses learnable query motion particles refined via DSM to model multiple motion modes. The approach achieves state-of-the-art robustness and accuracy across four large inertial datasets, especially in unseen dynamic scenarios, demonstrating strong generalization for trajectory reconstruction. This work advances practical inertial navigation by jointly modeling cross-modal cues and motion uncertainty within a unified Transformer framework, offering improved reliability for AR/VR, robotics, and related domains.

Abstract

We propose iMoT, an innovative Transformer-based inertial odometry method that retrieves cross-modal information from motion and rotation modalities for accurate positional estimation. Unlike prior work, during the encoding of the motion context, we introduce Progressive Series Decoupler at the beginning of each encoder layer to stand out critical motion events inherent in acceleration and angular velocity signals. To better aggregate cross-modal interactions, we present Adaptive Positional Encoding, which dynamically modifies positional embeddings for temporal discrepancies between different modalities. During decoding, we introduce a small set of learnable query motion particles as priors to model motion uncertainties within velocity segments. Each query motion particle is intended to draw cross-modal features dedicated to a specific motion mode, all taken together allowing the model to refine its understanding of motion dynamics effectively. Lastly, we design a dynamic scoring mechanism to stabilize iMoT's optimization by considering all aligned motion particles at the final decoding step, ensuring robust and accurate velocity segment estimation. Extensive evaluations on various inertial datasets demonstrate that iMoT significantly outperforms state-of-the-art methods in delivering superior robustness and accuracy in trajectory reconstruction.

iMoT: Inertial Motion Transformer for Inertial Navigation

TL;DR

iMoT addresses inertial odometry by fusing acceleration and angular velocity through a Transformer-based encoder–decoder. It introduces PSD to extract informative temporal components, APE to align cross-modal positions, ASC to preserve cross-channel details, and a decoder that uses learnable query motion particles refined via DSM to model multiple motion modes. The approach achieves state-of-the-art robustness and accuracy across four large inertial datasets, especially in unseen dynamic scenarios, demonstrating strong generalization for trajectory reconstruction. This work advances practical inertial navigation by jointly modeling cross-modal cues and motion uncertainty within a unified Transformer framework, offering improved reliability for AR/VR, robotics, and related domains.

Abstract

We propose iMoT, an innovative Transformer-based inertial odometry method that retrieves cross-modal information from motion and rotation modalities for accurate positional estimation. Unlike prior work, during the encoding of the motion context, we introduce Progressive Series Decoupler at the beginning of each encoder layer to stand out critical motion events inherent in acceleration and angular velocity signals. To better aggregate cross-modal interactions, we present Adaptive Positional Encoding, which dynamically modifies positional embeddings for temporal discrepancies between different modalities. During decoding, we introduce a small set of learnable query motion particles as priors to model motion uncertainties within velocity segments. Each query motion particle is intended to draw cross-modal features dedicated to a specific motion mode, all taken together allowing the model to refine its understanding of motion dynamics effectively. Lastly, we design a dynamic scoring mechanism to stabilize iMoT's optimization by considering all aligned motion particles at the final decoding step, ensuring robust and accurate velocity segment estimation. Extensive evaluations on various inertial datasets demonstrate that iMoT significantly outperforms state-of-the-art methods in delivering superior robustness and accuracy in trajectory reconstruction.

Paper Structure

This paper contains 26 sections, 10 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: iMoT architecture: (1) The Encoder synthesizes motion context features from Acceleration and Angular Velocity tokens, incorporating three key innovations: Progressive Series Decoupler for enhanced information absorption, Adaptive Positional Encoding to handle modality differences, and Adaptive Spatial Sync to maintain cross-channel interactions. (2) The Decoder manipulates query motion particles to capture motion variability within velocity segments through cross-modal information retrieval. Two signal types are highlighted: the blue flow for base signals like sinusoidal encoding and the magenta flow for controlling signals, including tokens and velocity particles, making static modules or operations responsive to changes.
  • Figure 2: Progressive Series Decoupler.
  • Figure 3: Adaptive Spatial Sync.
  • Figure 4: Ablation Study on the number of query motion particles on RoNIN dataset.
  • Figure 5: Cumulative Error Distributions (CDF) with three types of metric types, and boxplot of PDE on RoNIN dataset.
  • ...and 9 more figures