Table of Contents
Fetching ...

RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data

Maxwell A. Xu, Jaya Narain, Gregory Darnell, Haraldur Hallgrimsson, Hyewon Jeong, Darren Forde, Richard Fineman, Karthik J. Raghuram, James M. Rehg, Shirley Ren

TL;DR

RelCon addresses the lack of generalizable foundation models for health time-series by introducing a motif-based, learnable distance and a relative contrastive loss tailored to accelerometry. It pretrains on $1\times10^9$ samples from $87{,}376$ AHMS participants using a $256$-dimensional embedding produced by a $1$D ResNet-34 backbone, achieving state-of-the-art results across gait- and HAR-related tasks and demonstrating cross-task generalization. Key contributions include (i) a learnable, accelerometry-specific distance, (ii) a relative, hierarchical loss that preserves nuanced similarities, and (iii) extensive ablations showing the necessity of augmentations, RevIN, and within-subject dynamics for robust performance. The findings suggest that motion foundation models trained on real-world wearable data can generalize across diverse downstream analyses, with potential applicability to other biosignals and multi-location sensor settings.

Abstract

We present RelCon, a novel self-supervised Relative Contrastive learning approach for training a motion foundation model from wearable accelerometry sensors. First, a learnable distance measure is trained to capture motif similarity and domain-specific semantic information such as rotation invariance. Then, the learned distance provides a measurement of semantic similarity between a pair of accelerometry time-series, which we use to train our foundation model to model relative relationships across time and across subjects. The foundation model is trained on 1 billion segments from 87,376 participants, and achieves state-of-the-art performance across multiple downstream tasks, including human activity recognition and gait metric regression. To our knowledge, we are the first to show the generalizability of a foundation model with motion data from wearables across distinct evaluation tasks.

RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data

TL;DR

RelCon addresses the lack of generalizable foundation models for health time-series by introducing a motif-based, learnable distance and a relative contrastive loss tailored to accelerometry. It pretrains on samples from AHMS participants using a -dimensional embedding produced by a D ResNet-34 backbone, achieving state-of-the-art results across gait- and HAR-related tasks and demonstrating cross-task generalization. Key contributions include (i) a learnable, accelerometry-specific distance, (ii) a relative, hierarchical loss that preserves nuanced similarities, and (iii) extensive ablations showing the necessity of augmentations, RevIN, and within-subject dynamics for robust performance. The findings suggest that motion foundation models trained on real-world wearable data can generalize across diverse downstream analyses, with potential applicability to other biosignals and multi-location sensor settings.

Abstract

We present RelCon, a novel self-supervised Relative Contrastive learning approach for training a motion foundation model from wearable accelerometry sensors. First, a learnable distance measure is trained to capture motif similarity and domain-specific semantic information such as rotation invariance. Then, the learned distance provides a measurement of semantic similarity between a pair of accelerometry time-series, which we use to train our foundation model to model relative relationships across time and across subjects. The foundation model is trained on 1 billion segments from 87,376 participants, and achieves state-of-the-art performance across multiple downstream tasks, including human activity recognition and gait metric regression. To our knowledge, we are the first to show the generalizability of a foundation model with motion data from wearables across distinct evaluation tasks.

Paper Structure

This paper contains 28 sections, 3 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: t-SNE of representation spaces with perplexity=100. Our RelCon approach has the clearest clusterings based on semantic classes, even forming a specific swimming cluster that is not seen in the other methods. RelCon can also better clearly separate between In/Outdoor Cycling.
  • Figure 2: SOTA Accelerometry SSL Methods. Each sequence color represents a different user's time-series. RelCon draws candidates from both within- and between-user and ranks them by their relative similarity via a learnable distance function. Then, it iteratively applies a contrastive loss, selecting one candidate as positive while assigning the more distant as negative. This helps prevent false positives/negatives because the full relative ranking is captured. Prior approaches define a single positive/negative set, risking semantic errors if positive/negatives are misdefined. AugPred and SimCLR are resistant to false positives because they construct positive pairs via semantics-preserving augmentations, but REBAR does not have a semantically-constrained pair construction.
  • Figure 3: Correlation between predicted and true DST. RelCon has the highest correlation.
  • Figure 4: ROC curves of Workout-Level Field Human Activity Recognition. Among other gains, RelCon is able to clearly better classify stair climbing (purple) compared to other approaches. Further class-specific prediction results can be found in the confusion matrices found in Fig. \ref{['fig:confusionworkoutall']}
  • Figure 5: Confusion Matrices for AHMS Field Human Activity Recognition at the Workout Level. We can see RelCon has the best performance. Unlike REBAR and AugPred, RelCon can better predict Outdoor running from Indoor Running. RelCon is also able to better predict Stair Climbing unlike the others.