Table of Contents
Fetching ...

KISS-IMU: Self-supervised Inertial Odometry with Motion-balanced Learning and Uncertainty-aware Inference

Jiwon Choi, Hogyun Kim, Geonmo Yang, Juhui Lee, Younggun Cho

TL;DR

KISS-IMU is proposed, a novel self-supervised inertial odometry framework that eliminates ground truth dependency by leveraging simple LiDAR-based ICP registration and pose graph optimization as a supervisory signal and enables the framework to ensure robustness without relying on joint multi-modal learning or ground truth supervision.

Abstract

Inertial measurement units (IMUs), which provide high-frequency linear acceleration and angular velocity measurements, serve as fundamental sensing modalities in robotic systems. Recent advances in deep neural networks have led to remarkable progress in inertial odometry. However, the heavy reliance on ground truth data during training fundamentally limits scalability and generalization to unseen and diverse environments. We propose KISS-IMU, a novel self-supervised inertial odometry framework that eliminates ground truth dependency by leveraging simple LiDAR-based ICP registration and pose graph optimization as a supervisory signal. Our approach embodies two key principles: keeping the IMU stable through motion-aware balanced training and keeping the IMU strong through uncertainty-driven adaptive weighting during inference. To evaluate performance across diverse motion patterns and scenarios, we conducted comprehensive experiments on various real-world platforms, including quadruped robots. Importantly, we train only the IMU network in a self-supervised manner, with LiDAR serving solely as a lightweight supervisory signal rather than requiring additional learnable processes. This design enables the framework to ensure robustness without relying on joint multi-modal learning or ground truth supervision. The supplementary materials are available at https://sparolab.github.io/research/kiss_imu.

KISS-IMU: Self-supervised Inertial Odometry with Motion-balanced Learning and Uncertainty-aware Inference

TL;DR

KISS-IMU is proposed, a novel self-supervised inertial odometry framework that eliminates ground truth dependency by leveraging simple LiDAR-based ICP registration and pose graph optimization as a supervisory signal and enables the framework to ensure robustness without relying on joint multi-modal learning or ground truth supervision.

Abstract

Inertial measurement units (IMUs), which provide high-frequency linear acceleration and angular velocity measurements, serve as fundamental sensing modalities in robotic systems. Recent advances in deep neural networks have led to remarkable progress in inertial odometry. However, the heavy reliance on ground truth data during training fundamentally limits scalability and generalization to unseen and diverse environments. We propose KISS-IMU, a novel self-supervised inertial odometry framework that eliminates ground truth dependency by leveraging simple LiDAR-based ICP registration and pose graph optimization as a supervisory signal. Our approach embodies two key principles: keeping the IMU stable through motion-aware balanced training and keeping the IMU strong through uncertainty-driven adaptive weighting during inference. To evaluate performance across diverse motion patterns and scenarios, we conducted comprehensive experiments on various real-world platforms, including quadruped robots. Importantly, we train only the IMU network in a self-supervised manner, with LiDAR serving solely as a lightweight supervisory signal rather than requiring additional learnable processes. This design enables the framework to ensure robustness without relying on joint multi-modal learning or ground truth supervision. The supplementary materials are available at https://sparolab.github.io/research/kiss_imu.
Paper Structure (22 sections, 15 equations, 9 figures, 2 tables)

This paper contains 22 sections, 15 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: KISS-IMU performance on unseen LAWN sequence (trained on Forest, both from DiTer++). (a) Training without a Gaussian mixture model (GMM) weighting: Imbalanced motion components bias learning toward dominant patterns, causing trajectory drift and poor generalization. (b) Training with GMM weighting: Our balanced motion components mitigate bias and improve generalization. Red boxes in (a) indicate failures in regions with rare motion components, while blue boxes in (b) highlight challenging regions corrected through motion-aware reweighting.
  • Figure 2: Our inertial odometry (IO) framework follows the Keep IMU Stable and Strong philosophy through three components: (a) Self-supervised training combines an IMU network for correction and uncertainty prediction with a LiDAR registration module for geometric constraints. Pose graph optimization (PGO) fuses both modalities, followed by selective pseudo-label generation using symmetric overlap scores for supervision without ground truth. (b) GMM analysis stabilizes IO via motion clustering. While the original GMM reveals imbalanced motion distributions, our balancing strategy ensures uniform coverage, with reweighting emphasizing underrepresented motions during optimization. (c) Sensor confidence-aware PGO strengthens IO through adaptive weighting. Learned uncertainties from IMU and LiDAR modules enable dynamic confidence adjustment during inference, ensuring robustness across varying motion and sensor conditions.
  • Figure 3: Our proposed network architecture. A CNN-GRU encoder processes raw IMU measurements (i.e., $\mathbf{m}_k = [\boldsymbol{\omega}_k, \boldsymbol{\alpha}_k]^T$) from $\mathcal{M}_{i,i+1}$ to extract features and estimate learned corrections $\hat{\boldsymbol{\sigma}}_k$ and uncertainties $\hat{\boldsymbol{\eta}}_k$. An integrator combines these with the previous state to produce corrected measurements in \ref{['equ:corrected_meas']} and associated $\hat{\boldsymbol{\eta}}_k$.
  • Figure 4: t-SNE visualization of motion pattern clustering with quantitative evaluation using silhouette score rousseeuw1987silhouettes and adjusted rand index hubert1985comparing. Motion samples from GMM-determined labels (left turn, right turn, straight), identified via ground truth velocity analysis, are shown in seen (Forest) and unseen (LAWN) environments. Without GMM balancing, clustering exhibits limited separation, whereas GMM-based training improves inter-class separation and intra-class compactness. These results indicate better generalization in embedding representations through motion-aware reweighting.
  • Figure 5: Motion pattern analysis using GMM across DiTer++ sequences. Using Bayesian information criterion wan2019novel, we determine the optimal number of components as $G$=7. GMM distributions of normalized angular speed reveal motion characteristics for (a) Forest, (b) LAWN, and (c) PARK sequences. Lower Wasserstein distances between Forest and LAWN compared to PARK indicate better motion similarity.
  • ...and 4 more figures