Table of Contents
Fetching ...

DISCOVER: Data-driven Identification of Sub-activities via Clustering and Visualization for Enhanced Activity Recognition in Smart Homes

Alexander Karpekov, Sonia Chernova, Thomas Plötz

TL;DR

DISCOVER tackles the high cost of labeled data and the need for flexible activity granularity in smart-home HAR by delivering a self-supervised, two-stage pipeline that discovers fine-grained sub-activities from unlabeled ambient-sensor streams without pre-segmentation. It combines a BERT-based encoder pre-trained with masked language modeling on sensor tokens, SCAN-based clustering to form coherent sub-activity clusters, and a visualization-driven centroid annotation workflow with label propagation back to the full dataset. Evaluated on CASAS Milan, Aruba, and Cairo datasets, DISCOVER yields semantically meaningful sub-activities, revealing finer distinctions within coarse CASAS labels, while achieving substantially reduced annotation effort. The approach demonstrates practical potential for scalable, adaptable HAR in real-world homes, with an open-source tool to support annotation and re-annotation at varying granularities and across diverse environments.

Abstract

Human Activity Recognition (HAR) using ambient sensors has great potential for practical applications, particularly in elder care and independent living. However, deploying HAR systems in real-world settings remains challenging due to the high cost of labeled data, the need for pre-segmented sensor streams, and the lack of flexibility in activity granularity. To address these limitations, we introduce DISCOVER, a method designed to discover fine-grained human sub-activities from unlabeled sensor data without relying on pre-segmentation. DISCOVER combines unsupervised feature extraction and clustering with a user-friendly visualization tool to streamline the labeling process. DISCOVER enables domain experts to efficiently annotate only a minimal set of representative cluster centroids, reducing the annotation workload to a small number of samples (0.05% of our dataset). We demonstrate DISCOVER's effectiveness through a re-annotation exercise on widely used HAR datasets, showing that it uncovers finer-grained activities and produces more nuanced annotations than traditional coarse labels. DISCOVER represents a step toward practical, deployable HAR systems that adapt to diverse real environments.

DISCOVER: Data-driven Identification of Sub-activities via Clustering and Visualization for Enhanced Activity Recognition in Smart Homes

TL;DR

DISCOVER tackles the high cost of labeled data and the need for flexible activity granularity in smart-home HAR by delivering a self-supervised, two-stage pipeline that discovers fine-grained sub-activities from unlabeled ambient-sensor streams without pre-segmentation. It combines a BERT-based encoder pre-trained with masked language modeling on sensor tokens, SCAN-based clustering to form coherent sub-activity clusters, and a visualization-driven centroid annotation workflow with label propagation back to the full dataset. Evaluated on CASAS Milan, Aruba, and Cairo datasets, DISCOVER yields semantically meaningful sub-activities, revealing finer distinctions within coarse CASAS labels, while achieving substantially reduced annotation effort. The approach demonstrates practical potential for scalable, adaptable HAR in real-world homes, with an open-source tool to support annotation and re-annotation at varying granularities and across diverse environments.

Abstract

Human Activity Recognition (HAR) using ambient sensors has great potential for practical applications, particularly in elder care and independent living. However, deploying HAR systems in real-world settings remains challenging due to the high cost of labeled data, the need for pre-segmented sensor streams, and the lack of flexibility in activity granularity. To address these limitations, we introduce DISCOVER, a method designed to discover fine-grained human sub-activities from unlabeled sensor data without relying on pre-segmentation. DISCOVER combines unsupervised feature extraction and clustering with a user-friendly visualization tool to streamline the labeling process. DISCOVER enables domain experts to efficiently annotate only a minimal set of representative cluster centroids, reducing the annotation workload to a small number of samples (0.05% of our dataset). We demonstrate DISCOVER's effectiveness through a re-annotation exercise on widely used HAR datasets, showing that it uncovers finer-grained activities and produces more nuanced annotations than traditional coarse labels. DISCOVER represents a step toward practical, deployable HAR systems that adapt to diverse real environments.

Paper Structure

This paper contains 40 sections, 2 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Overview of DISCOVER, a self-supervised system designed to discover fine-grained human activities from unlabeled sensor data without relying on pre-segmentation, consisting of two main stages: clustering and labeling. After (0) slicing the raw data into continuous sliding windows of sensor activations without assuming any pre-segmentation, we (1) train a BERT model with mask language modeling task to encode the windows in an embedding space. We then use these embeddings to identify similar activity windows and (2) fine-tune a clustering model using SCAN loss, which results in assigning all data points to $k$ clusters. We then (3) sample a handful of windows closest to each cluster centroid, replay them on 2D house layouts using our custom built visualization tool, and send these samples to a group of experts for annotation. With minimal labeling effort, we obtain custom granular activity labels for each cluster centroid, and (4) propagate them to the rest of the data points in each cluster. These custom labels are then (5) applied to the original dataset and can later be used for a set of specialized downstream tasks.
  • Figure 2: DISCOVER approach -- model training pipeline, consisting of (1) Encoder Pre-Training, and (2) Clustering Model Fine-Tuning. DISCOVER first trains a BERT model using a Masked Language Modeling head in (1) to obtain initial embeddings for each window $W_i$. In step 2(a) it uses these embeddings to identify similar activity windows and pairs them together as a a new training set. It then continues training the pre-trained BERT base model with a SCAN loss (2(b)). In the end, the trained SCAN model assigns a cluster $c_k$ to each input sequence $W_i$ (2(c)).
  • Figure 3: DISCOVER custom-built interactive in-browser annotation tool for reviewing sensor activation sequences. The tool displays a 2D house layout, allowing annotators to replay sequences temporally, observe contextual and spatial details, and assign labels via a drop-down menu. The example above shows a sequence of sensor activations following a resident walking to the guest bathroom, that an annotator can label as "Guest Bathroom: Walking In" from the drop-down menu.
  • Figure 4: tSNE projection of SCAN embeddings from the Milan household; each point represents a sensor window embedding colored by its original CASAS label. Insets display the deployment environment layout overlaid with a heatmap of sensor activations. Insets (a) and (b) highlight two distinct clusters within the CASAS Cook label—cluster 16 showing movement between kitchen and dining areas, and cluster 5 capturing activity near the medicine cabinet. Insets (c) and (d) show clusters from the Relax label, corresponding to sitting in the TV room armchair and sitting in the living room armchair. Data associated with the Other label (gray) is dispersed across clusters, underscoring its heterogeneous nature. This figure showcases DISCOVER's capability to uncover more granular and nuanced activity categories than the original CASAS labels.
  • Figure 5: Macro F1 scores for varying numbers of SCAN clusters ($k$) on Milan, Aruba, and Cairo datasets, using all CASAS labels in (a), and without the Other label in (b). We chart shows average F1 scores with bootstrapped 95% confidence intervals . Increasing $k$ up until 20-40 clusters improves alignment with CASAS labels before the performance improvement stagnates, suggesting that the optimal number of clusters lies in that range.
  • ...and 8 more figures