Table of Contents
Fetching ...

Self-supervised Learning for Human Activity Recognition Using 700,000 Person-days of Wearable Data

Hang Yuan, Shing Chan, Andrew P. Creagh, Catherine Tong, Aidan Acquah, David A. Clifton, Aiden Doherty

TL;DR

This work addresses the limited labelled data problem in human activity recognition by leveraging self-supervised learning on a massive unlabelled wearable dataset from UK Biobank. It introduces a multi-task SSL framework using three pretext tasks—AoT, Permutation, and Time Warping—along with a ResNet-V2 backbone to learn transferable representations that generalize across seven external HAR benchmarks. Results show consistent improvements over training-from-scratch and random forests, with the largest gains on small datasets, and demonstrate that SSL representations capture motion dynamics and intensity while enabling robust transfer learning. The study also provides insights into model interpretability via LRP and uses ablation analyses to guide data collection and model deployment, with open-source models to facilitate future research and deployment in resource-constrained settings.

Abstract

Advances in deep learning for human activity recognition have been relatively limited due to the lack of large labelled datasets. In this study, we leverage self-supervised learning techniques on the UK-Biobank activity tracker dataset--the largest of its kind to date--containing more than 700,000 person-days of unlabelled wearable sensor data. Our resulting activity recognition model consistently outperformed strong baselines across seven benchmark datasets, with an F1 relative improvement of 2.5%-100% (median 18.4%), the largest improvements occurring in the smaller datasets. In contrast to previous studies, our results generalise across external datasets, devices, and environments. Our open-source model will help researchers and developers to build customisable and generalisable activity classifiers with high performance.

Self-supervised Learning for Human Activity Recognition Using 700,000 Person-days of Wearable Data

TL;DR

This work addresses the limited labelled data problem in human activity recognition by leveraging self-supervised learning on a massive unlabelled wearable dataset from UK Biobank. It introduces a multi-task SSL framework using three pretext tasks—AoT, Permutation, and Time Warping—along with a ResNet-V2 backbone to learn transferable representations that generalize across seven external HAR benchmarks. Results show consistent improvements over training-from-scratch and random forests, with the largest gains on small datasets, and demonstrate that SSL representations capture motion dynamics and intensity while enabling robust transfer learning. The study also provides insights into model interpretability via LRP and uses ablation analyses to guide data collection and model deployment, with open-source models to facilitate future research and deployment in resource-constrained settings.

Abstract

Advances in deep learning for human activity recognition have been relatively limited due to the lack of large labelled datasets. In this study, we leverage self-supervised learning techniques on the UK-Biobank activity tracker dataset--the largest of its kind to date--containing more than 700,000 person-days of unlabelled wearable sensor data. Our resulting activity recognition model consistently outperformed strong baselines across seven benchmark datasets, with an F1 relative improvement of 2.5%-100% (median 18.4%), the largest improvements occurring in the smaller datasets. In contrast to previous studies, our results generalise across external datasets, devices, and environments. Our open-source model will help researchers and developers to build customisable and generalisable activity classifiers with high performance.
Paper Structure (37 sections, 1 equation, 10 figures, 7 tables)

This paper contains 37 sections, 1 equation, 10 figures, 7 tables.

Figures (10)

  • Figure 1: A ten-second window of tri-axial accelerometer data when a person is ironing in its original form and three corresponding transformations: reversed, permuted, and time-warped.
  • Figure 2: Accuracy test curves for training four self-supervised tasks individually using 1000 subjects from the UK-Biobank with and without weighted sampling. The patience for early-stopping was five.
  • Figure 3: How different models perform in the downstream tasks when we change the number of labelled subjects: Capture-24 (left) and Rowlands (right).
  • Figure 4: Left: More #unlabelled subjects in the SSL leads to better downstream performance. Right: Unlabelled data per subject does not improve downstream performance when using 10,000 subjects for pre-training.
  • Figure 6: Cluster analysis on raw inputs, untrained features and SSL-pretrained features. We use color gradients to denote activity intensities. Results suggest that SSL-derived features are better at clustering similar activities (e.g. walking, stair climbing vs. sitting, writing, typing) as well as their intensities (e.g. lying down, sitting, standing vs. jogging, sports).
  • ...and 5 more figures