Table of Contents
Fetching ...

Comparing Self-Supervised Learning Techniques for Wearable Human Activity Recognition

Sannara Ek, Riccardo Presotto, Gabriele Civitarese, François Portet, Philippe Lalanda, Claudio Bettini

TL;DR

This work tackles the challenge of wearable HAR under limited labeled data by evaluating three state-of-the-art SSL approaches—SimCLR (contrastive), MAE (generative), and data2vec (predictive)—across CNN-based and transformer-based backbones using a Leave-One-Dataset-Out framework. The study demonstrates that MAE is the most robust SSL method when the feature extractor is kept frozen, often outperforming both the other SSL methods and supervised pre-training, while its advantage persists in data-scarce scenarios. In full fine-tuning (unfrozen), MAE stays competitive with supervised methods when using transformer architectures; in extremely low-resource settings, MAE provides strong transfer benefits. The findings, together with public code and pre-trained models, underscore MAE’s practicality for robust, data-efficient HAR and establish a foundation for future exploration of SSL techniques in wearables.

Abstract

Human Activity Recognition (HAR) based on the sensors of mobile/wearable devices aims to detect the physical activities performed by humans in their daily lives. Although supervised learning methods are the most effective in this task, their effectiveness is constrained to using a large amount of labeled data during training. While collecting raw unlabeled data can be relatively easy, annotating data is challenging due to costs, intrusiveness, and time constraints. To address these challenges, this paper explores alternative approaches for accurate HAR using a limited amount of labeled data. In particular, we have adapted recent Self-Supervised Learning (SSL) algorithms to the HAR domain and compared their effectiveness. We investigate three state-of-the-art SSL techniques of different families: contrastive, generative, and predictive. Additionally, we evaluate the impact of the underlying neural network on the recognition rate by comparing state-of-the-art CNN and transformer architectures. Our results show that a Masked Auto Encoder (MAE) approach significantly outperforms other SSL approaches, including SimCLR, commonly considered one of the best-performing SSL methods in the HAR domain. The code and the pre-trained SSL models are publicly available for further research and development.

Comparing Self-Supervised Learning Techniques for Wearable Human Activity Recognition

TL;DR

This work tackles the challenge of wearable HAR under limited labeled data by evaluating three state-of-the-art SSL approaches—SimCLR (contrastive), MAE (generative), and data2vec (predictive)—across CNN-based and transformer-based backbones using a Leave-One-Dataset-Out framework. The study demonstrates that MAE is the most robust SSL method when the feature extractor is kept frozen, often outperforming both the other SSL methods and supervised pre-training, while its advantage persists in data-scarce scenarios. In full fine-tuning (unfrozen), MAE stays competitive with supervised methods when using transformer architectures; in extremely low-resource settings, MAE provides strong transfer benefits. The findings, together with public code and pre-trained models, underscore MAE’s practicality for robust, data-efficient HAR and establish a foundation for future exploration of SSL techniques in wearables.

Abstract

Human Activity Recognition (HAR) based on the sensors of mobile/wearable devices aims to detect the physical activities performed by humans in their daily lives. Although supervised learning methods are the most effective in this task, their effectiveness is constrained to using a large amount of labeled data during training. While collecting raw unlabeled data can be relatively easy, annotating data is challenging due to costs, intrusiveness, and time constraints. To address these challenges, this paper explores alternative approaches for accurate HAR using a limited amount of labeled data. In particular, we have adapted recent Self-Supervised Learning (SSL) algorithms to the HAR domain and compared their effectiveness. We investigate three state-of-the-art SSL techniques of different families: contrastive, generative, and predictive. Additionally, we evaluate the impact of the underlying neural network on the recognition rate by comparing state-of-the-art CNN and transformer architectures. Our results show that a Masked Auto Encoder (MAE) approach significantly outperforms other SSL approaches, including SimCLR, commonly considered one of the best-performing SSL methods in the HAR domain. The code and the pre-trained SSL models are publicly available for further research and development.
Paper Structure (22 sections, 8 figures, 6 tables)

This paper contains 22 sections, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Overview of SimCLR for HAR
  • Figure 2: Overview of Masked Autoencoder (MAE) for HAR.
  • Figure 3: Overview of Data2vec for HAR
  • Figure 4: Overview of the Leave-One-Dataset-Out evaluation methodology
  • Figure 5: T-SNE projections of the studied SSL methods on the HHAR dataset
  • ...and 3 more figures