Table of Contents
Fetching ...

IMUOptimize: A Data-Driven Approach to Optimal IMU Placement for Human Pose Estimation with Transformer Architecture

Varun Ramani, Hossein Khayami, Yang Bai, Nakul Garg, Nirupam Roy

TL;DR

This work tackles IMU-based human pose estimation by marrying a data-driven approach to optimal sensor placement with a transformer-based time-series model. By first identifying informative IMU locations across a 24-joint SMPL skeleton and then applying a transformer encoder, the method achieves superior pose reconstruction—outperforming prior DIP-IMU baselines—both with dense (24 IMUs) and sparse (6 IMUs) sensor configurations. The transformer not only improves accuracy but also offers substantial training speed advantages over LSTM-based approaches. A key finding is that optimal IMU placements are highly dataset- and model-dependent, underscoring the need for context-aware sensor selection in IMU-based pose estimation tasks.

Abstract

This paper presents a novel approach for predicting human poses using IMU data, diverging from previous studies such as DIP-IMU, IMUPoser, and TransPose, which use up to 6 IMUs in conjunction with bidirectional RNNs. We introduce two main innovations: a data-driven strategy for optimal IMU placement and a transformer-based model architecture for time series analysis. Our findings indicate that our approach not only outperforms traditional 6 IMU-based biRNN models but also that the transformer architecture significantly enhances pose reconstruction from data obtained from 24 IMU locations, with equivalent performance to biRNNs when using only 6 IMUs. The enhanced accuracy provided by our optimally chosen locations, when coupled with the parallelizability and performance of transformers, provides significant improvements to the field of IMU-based pose estimation.

IMUOptimize: A Data-Driven Approach to Optimal IMU Placement for Human Pose Estimation with Transformer Architecture

TL;DR

This work tackles IMU-based human pose estimation by marrying a data-driven approach to optimal sensor placement with a transformer-based time-series model. By first identifying informative IMU locations across a 24-joint SMPL skeleton and then applying a transformer encoder, the method achieves superior pose reconstruction—outperforming prior DIP-IMU baselines—both with dense (24 IMUs) and sparse (6 IMUs) sensor configurations. The transformer not only improves accuracy but also offers substantial training speed advantages over LSTM-based approaches. A key finding is that optimal IMU placements are highly dataset- and model-dependent, underscoring the need for context-aware sensor selection in IMU-based pose estimation tasks.

Abstract

This paper presents a novel approach for predicting human poses using IMU data, diverging from previous studies such as DIP-IMU, IMUPoser, and TransPose, which use up to 6 IMUs in conjunction with bidirectional RNNs. We introduce two main innovations: a data-driven strategy for optimal IMU placement and a transformer-based model architecture for time series analysis. Our findings indicate that our approach not only outperforms traditional 6 IMU-based biRNN models but also that the transformer architecture significantly enhances pose reconstruction from data obtained from 24 IMU locations, with equivalent performance to biRNNs when using only 6 IMUs. The enhanced accuracy provided by our optimally chosen locations, when coupled with the parallelizability and performance of transformers, provides significant improvements to the field of IMU-based pose estimation.
Paper Structure (12 sections, 1 equation, 10 figures, 6 tables)

This paper contains 12 sections, 1 equation, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Global BiRNN Feature Ablation
  • Figure 2: Global Transformer Feature Ablation
  • Figure 3: ACCAD BiRNN Feature Ablation
  • Figure 4: BioMotionLab_NTroj BiRNN Feature Ablation
  • Figure 5: CMU BiRNN Feature Ablation
  • ...and 5 more figures