Comparing Self-Supervised Learning Techniques for Wearable Human Activity Recognition
Sannara Ek, Riccardo Presotto, Gabriele Civitarese, François Portet, Philippe Lalanda, Claudio Bettini
TL;DR
This work tackles the challenge of wearable HAR under limited labeled data by evaluating three state-of-the-art SSL approaches—SimCLR (contrastive), MAE (generative), and data2vec (predictive)—across CNN-based and transformer-based backbones using a Leave-One-Dataset-Out framework. The study demonstrates that MAE is the most robust SSL method when the feature extractor is kept frozen, often outperforming both the other SSL methods and supervised pre-training, while its advantage persists in data-scarce scenarios. In full fine-tuning (unfrozen), MAE stays competitive with supervised methods when using transformer architectures; in extremely low-resource settings, MAE provides strong transfer benefits. The findings, together with public code and pre-trained models, underscore MAE’s practicality for robust, data-efficient HAR and establish a foundation for future exploration of SSL techniques in wearables.
Abstract
Human Activity Recognition (HAR) based on the sensors of mobile/wearable devices aims to detect the physical activities performed by humans in their daily lives. Although supervised learning methods are the most effective in this task, their effectiveness is constrained to using a large amount of labeled data during training. While collecting raw unlabeled data can be relatively easy, annotating data is challenging due to costs, intrusiveness, and time constraints. To address these challenges, this paper explores alternative approaches for accurate HAR using a limited amount of labeled data. In particular, we have adapted recent Self-Supervised Learning (SSL) algorithms to the HAR domain and compared their effectiveness. We investigate three state-of-the-art SSL techniques of different families: contrastive, generative, and predictive. Additionally, we evaluate the impact of the underlying neural network on the recognition rate by comparing state-of-the-art CNN and transformer architectures. Our results show that a Masked Auto Encoder (MAE) approach significantly outperforms other SSL approaches, including SimCLR, commonly considered one of the best-performing SSL methods in the HAR domain. The code and the pre-trained SSL models are publicly available for further research and development.
