Consistency Based Weakly Self-Supervised Learning for Human Activity Recognition with Wearables
Taoran Sheng, Manfred Huber
TL;DR
This work tackles wearable HAR under limited labeling by introducing a weakly self-supervised learning framework that combines a ResNet-based autoencoder with Siamese networks. It enforces temporal and feature consistencies to shape a meaningful embedding space and uses a two-stage joint loss, first self-supervised and then lightly supervised via pairwise constraints, to refine clusters with few labels. Experimental results on PAMAP2, REALDISP, and SBHAR show substantial improvements over unsupervised baselines and competitive performance with only 10% of labels, drastically reducing labeling effort for HAR in ubiquitous sensing scenarios. The approach yields clustering-ready representations suitable for downstream classification, enabling scalable HAR with minimal annotation burden.
Abstract
While the widely available embedded sensors in smartphones and other wearable devices make it easier to obtain data of human activities, recognizing different types of human activities from sensor-based data remains a difficult research topic in ubiquitous computing. One reason for this is that most of the collected data is unlabeled. However, many current human activity recognition (HAR) systems are based on supervised methods, which heavily rely on the labels of the data. We describe a weakly self-supervised approach in this paper that consists of two stages: (1) In stage one, the model learns from the nature of human activities by projecting the data into an embedding space where similar activities are grouped together; (2) In stage two, the model is fine-tuned using similarity information in a few-shot learning fashion using the similarity information of the data. This allows downstream classification or clustering tasks to benefit from the embeddings. Experiments on three benchmark datasets demonstrate the framework's effectiveness and show that our approach can help the clustering algorithm achieve comparable performance in identifying and categorizing the underlying human activities as pure supervised techniques applied directly to a corresponding fully labeled data set.
