Table of Contents
Fetching ...

Bio-Inspired Self-Supervised Learning for Wrist-worn IMU Signals

Prithviraj Tarale, Kiet Chu, Abhishek Varghese, Kai-Chun Liu, Maxwell A Xu, Mohit Iyyer, Sunghoon I. Lee

TL;DR

This work introduces a novel tokenization strategy grounded in the submovement theory of motor control, which posits that continuous wrist motion is composed of superposed elementary basis functions called submovements and pretrain a Transformer encoder via masked movement-segment reconstruction to model the temporal dependencies of movement segments.

Abstract

Wearable accelerometers have enabled large-scale health and wellness monitoring, yet learning robust human-activity representations has been constrained by the scarcity of labeled data. While self-supervised learning offers a potential remedy, existing approaches treat sensor streams as unstructured time series, overlooking the underlying biological structure of human movement, a factor we argue is critical for effective Human Activity Recognition (HAR). We introduce a novel tokenization strategy grounded in the submovement theory of motor control, which posits that continuous wrist motion is composed of superposed elementary basis functions called submovements. We define our token as the movement segment, a unit of motion composed of a finite sequence of submovements that is readily extractable from wrist accelerometer signals. By treating these segments as tokens, we pretrain a Transformer encoder via masked movement-segment reconstruction to model the temporal dependencies of movement segments, shifting the learning focus beyond local waveform morphology. Pretrained on the NHANES corpus (approximately 28k hours; approximately 11k participants; approximately 10M windows), our representations outperform strong wearable SSL baselines across six subject-disjoint HAR benchmarks. Furthermore, they demonstrate stronger data efficiency in data-scarce settings. Code and pretrained weights will be made publicly available.

Bio-Inspired Self-Supervised Learning for Wrist-worn IMU Signals

TL;DR

This work introduces a novel tokenization strategy grounded in the submovement theory of motor control, which posits that continuous wrist motion is composed of superposed elementary basis functions called submovements and pretrain a Transformer encoder via masked movement-segment reconstruction to model the temporal dependencies of movement segments.

Abstract

Wearable accelerometers have enabled large-scale health and wellness monitoring, yet learning robust human-activity representations has been constrained by the scarcity of labeled data. While self-supervised learning offers a potential remedy, existing approaches treat sensor streams as unstructured time series, overlooking the underlying biological structure of human movement, a factor we argue is critical for effective Human Activity Recognition (HAR). We introduce a novel tokenization strategy grounded in the submovement theory of motor control, which posits that continuous wrist motion is composed of superposed elementary basis functions called submovements. We define our token as the movement segment, a unit of motion composed of a finite sequence of submovements that is readily extractable from wrist accelerometer signals. By treating these segments as tokens, we pretrain a Transformer encoder via masked movement-segment reconstruction to model the temporal dependencies of movement segments, shifting the learning focus beyond local waveform morphology. Pretrained on the NHANES corpus (approximately 28k hours; approximately 11k participants; approximately 10M windows), our representations outperform strong wearable SSL baselines across six subject-disjoint HAR benchmarks. Furthermore, they demonstrate stronger data efficiency in data-scarce settings. Code and pretrained weights will be made publicly available.
Paper Structure (61 sections, 3 equations, 12 figures, 6 tables)

This paper contains 61 sections, 3 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Bio-PM Representation Learning. We (i) tokenize accelerometry into movement-aligned segments, (ii) pretrain by modeling temporal relation across segments with a Transformer under masked reconstruction, and (iii) transfer the frozen encoder to downstream HAR for linear probing.
  • Figure 2: Movement segments in the velocity domain arise from overlapping submovements; we operationalize analogous units directly from accelerometry by defining tokens between successive acceleration Zero-Crossings (corresponding to velocity extrema). Example shown for walking (MHealth); Appendix Fig. 5 details the full tokenization pipeline.
  • Figure 3: Next-token prediction accuracy on unseen token transitions using Bio-PM embeddings. Contextual embeddings outperform the non-contextual baseline, and shuffling largely removes this gain, indicating reliance on temporal organization rather than token identity alone. Chance prediction is $\frac{1}{K}$, with K selected per dataset by silhouette score. HAD* is less stable due to substantially fewer unique transitions ($\approx$ 250 vs 2.3k--27k).
  • Figure 4: Macro-F1 of frozen linear probes as a function of the fraction of labeled subjects used for training (subject-disjoint splits). Bio-PM remains competitive at low label fractions and scales more favorably with additional labeled subjects than controlled SSL baselines.
  • Figure 5: End-to-end conversion from raw wrist accelerometer signals to fixed-length movement-segment tokens: gravity separation via filtering, zero-crossing boundary detection, segmentation, and resampling to a common token length, along with token metadata used for sequence modeling.
  • ...and 7 more figures