IMUOptimize: A Data-Driven Approach to Optimal IMU Placement for Human Pose Estimation with Transformer Architecture
Varun Ramani, Hossein Khayami, Yang Bai, Nakul Garg, Nirupam Roy
TL;DR
This work tackles IMU-based human pose estimation by marrying a data-driven approach to optimal sensor placement with a transformer-based time-series model. By first identifying informative IMU locations across a 24-joint SMPL skeleton and then applying a transformer encoder, the method achieves superior pose reconstruction—outperforming prior DIP-IMU baselines—both with dense (24 IMUs) and sparse (6 IMUs) sensor configurations. The transformer not only improves accuracy but also offers substantial training speed advantages over LSTM-based approaches. A key finding is that optimal IMU placements are highly dataset- and model-dependent, underscoring the need for context-aware sensor selection in IMU-based pose estimation tasks.
Abstract
This paper presents a novel approach for predicting human poses using IMU data, diverging from previous studies such as DIP-IMU, IMUPoser, and TransPose, which use up to 6 IMUs in conjunction with bidirectional RNNs. We introduce two main innovations: a data-driven strategy for optimal IMU placement and a transformer-based model architecture for time series analysis. Our findings indicate that our approach not only outperforms traditional 6 IMU-based biRNN models but also that the transformer architecture significantly enhances pose reconstruction from data obtained from 24 IMU locations, with equivalent performance to biRNNs when using only 6 IMUs. The enhanced accuracy provided by our optimally chosen locations, when coupled with the parallelizability and performance of transformers, provides significant improvements to the field of IMU-based pose estimation.
