DISCOVER: Data-driven Identification of Sub-activities via Clustering and Visualization for Enhanced Activity Recognition in Smart Homes
Alexander Karpekov, Sonia Chernova, Thomas Plötz
TL;DR
DISCOVER tackles the high cost of labeled data and the need for flexible activity granularity in smart-home HAR by delivering a self-supervised, two-stage pipeline that discovers fine-grained sub-activities from unlabeled ambient-sensor streams without pre-segmentation. It combines a BERT-based encoder pre-trained with masked language modeling on sensor tokens, SCAN-based clustering to form coherent sub-activity clusters, and a visualization-driven centroid annotation workflow with label propagation back to the full dataset. Evaluated on CASAS Milan, Aruba, and Cairo datasets, DISCOVER yields semantically meaningful sub-activities, revealing finer distinctions within coarse CASAS labels, while achieving substantially reduced annotation effort. The approach demonstrates practical potential for scalable, adaptable HAR in real-world homes, with an open-source tool to support annotation and re-annotation at varying granularities and across diverse environments.
Abstract
Human Activity Recognition (HAR) using ambient sensors has great potential for practical applications, particularly in elder care and independent living. However, deploying HAR systems in real-world settings remains challenging due to the high cost of labeled data, the need for pre-segmented sensor streams, and the lack of flexibility in activity granularity. To address these limitations, we introduce DISCOVER, a method designed to discover fine-grained human sub-activities from unlabeled sensor data without relying on pre-segmentation. DISCOVER combines unsupervised feature extraction and clustering with a user-friendly visualization tool to streamline the labeling process. DISCOVER enables domain experts to efficiently annotate only a minimal set of representative cluster centroids, reducing the annotation workload to a small number of samples (0.05% of our dataset). We demonstrate DISCOVER's effectiveness through a re-annotation exercise on widely used HAR datasets, showing that it uncovers finer-grained activities and produces more nuanced annotations than traditional coarse labels. DISCOVER represents a step toward practical, deployable HAR systems that adapt to diverse real environments.
