PRIMUS: Pretraining IMU Encoders with Multimodal Self-Supervision

Arnav M. Das; Chi Ian Tang; Fahim Kawsar; Mohammad Malekzadeh

PRIMUS: Pretraining IMU Encoders with Multimodal Self-Supervision

Arnav M. Das, Chi Ian Tang, Fahim Kawsar, Mohammad Malekzadeh

TL;DR

PRIMUS addresses the challenge of learning transferable IMU representations under label scarcity by multi-objectively pretraining an IMU encoder with self-supervised, multimodal, and nearest-neighbor supervision. By aligning IMU features with video and text through L_MM, enforcing augmentation invariance via L_SS, and exploiting cross-instance signals with L_NN, PRIMUS achieves substantial gains in few-shot and out-of-domain activity recognition, outperforming prior IMU pretraining methods by up to about 15 percentage points. The approach demonstrates data efficiency, robustness across domains, and practical viability for mobile wearables, with open-source code to foster community adoption. These results suggest that integrating diverse supervisory signals during pretraining yields highly transferable IMU encoders suitable for real-world health and activity monitoring applications.

Abstract

Sensing human motions through Inertial Measurement Units (IMUs) embedded in personal devices has enabled significant applications in health and wellness. Labeled IMU data is scarce, however, unlabeled or weakly labeled IMU data can be used to model human motions. For video or text modalities, the "pretrain and adapt" approach utilizes large volumes of unlabeled or weakly labeled data to build a strong feature extractor, followed by adaptation to specific tasks using limited labeled data. However, pretraining methods are poorly understood for IMU data, and pipelines are rarely evaluated on out-of-domain tasks. We propose PRIMUS: a method for PRetraining IMU encoderS that uses a novel pretraining objective that is empirically validated based on downstream performance on both in-domain and out-of-domain datasets. The PRIMUS objective effectively enhances downstream performance by combining self-supervision, multimodal, and nearest-neighbor supervision. With fewer than 500 labeled samples per class, PRIMUS improves test accuracy by up to 15%, compared to state-of-the-art baselines. To benefit the broader community, we have open-sourced our code at github.com/nokia-bell-labs/pretrained-imu-encoders.

PRIMUS: Pretraining IMU Encoders with Multimodal Self-Supervision

TL;DR

Abstract

PRIMUS: Pretraining IMU Encoders with Multimodal Self-Supervision

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)