Table of Contents
Fetching ...

"Filling the Blanks'': Identifying Micro-activities that Compose Complex Human Activities of Daily Living

Soumyajit Chatterjee, Bivas Mitra, Sandip Chakraborty

TL;DR

This work tackles the challenge of decomposing complex ADLs into fine-grained micro-activities without demanding granular micro-annotations. It proposes AmicroN, a top-down framework that first segments macro-activities with unsupervised change-point detection and then uses generalized zero-shot learning with latent semantic embeddings to recognize micro-activities. Evaluations on Kitchen and LARa datasets show robust micro-activity identification (median $F_1$ around $0.75$) and accurate change-point detection, while a qualitative LLM analysis demonstrates potential for enhanced explainability. The approach offers a cost-effective, explainable, and sensor-robust path toward richer HAR models, with future multimodal extensions and LLM-based virtual supervision highlighted as promising directions.

Abstract

Complex activities of daily living (ADLs) often consist of multiple micro-activities. When performed sequentially, these micro-activities help the user accomplish the broad macro-activity. Naturally, a deeper understanding of these micro-activities can help develop more sophisticated human activity recognition (HAR) models and add explainability to their inferred conclusions. Previous research has attempted to achieve this by utilizing fine-grained annotated data that provided the required supervision and rules for associating the micro-activities to identify the macro-activity. However, this ``bottom-up'' approach is unrealistic as getting such high-quality, fine-grained annotated sensor datasets is challenging, costly, and time-consuming. Understanding this, in this paper, we develop AmicroN, which adapts a ``top-down'' approach by exploiting coarse-grained annotated data to expand the macro-activities into their constituent micro-activities without any external supervision. In the backend, AmicroN uses \textit{unsupervised} change-point detection to search for the micro-activity boundaries across a complex ADL. Then, it applies a \textit{generalized zero-shot} approach to characterize it. We evaluate AmicroN on two real-life publicly available datasets and observe that AmicroN can identify the micro-activities with micro F\textsubscript{1}-score $>0.75$ for both datasets. Additionally, we also perform an initial proof-of-concept on leveraging the state-of-the-art (SOTA) large language models (LLMs) with attribute embeddings predicted by AmicroN to enhance further the explainability surrounding the detection of micro-activities.

"Filling the Blanks'': Identifying Micro-activities that Compose Complex Human Activities of Daily Living

TL;DR

This work tackles the challenge of decomposing complex ADLs into fine-grained micro-activities without demanding granular micro-annotations. It proposes AmicroN, a top-down framework that first segments macro-activities with unsupervised change-point detection and then uses generalized zero-shot learning with latent semantic embeddings to recognize micro-activities. Evaluations on Kitchen and LARa datasets show robust micro-activity identification (median around ) and accurate change-point detection, while a qualitative LLM analysis demonstrates potential for enhanced explainability. The approach offers a cost-effective, explainable, and sensor-robust path toward richer HAR models, with future multimodal extensions and LLM-based virtual supervision highlighted as promising directions.

Abstract

Complex activities of daily living (ADLs) often consist of multiple micro-activities. When performed sequentially, these micro-activities help the user accomplish the broad macro-activity. Naturally, a deeper understanding of these micro-activities can help develop more sophisticated human activity recognition (HAR) models and add explainability to their inferred conclusions. Previous research has attempted to achieve this by utilizing fine-grained annotated data that provided the required supervision and rules for associating the micro-activities to identify the macro-activity. However, this ``bottom-up'' approach is unrealistic as getting such high-quality, fine-grained annotated sensor datasets is challenging, costly, and time-consuming. Understanding this, in this paper, we develop AmicroN, which adapts a ``top-down'' approach by exploiting coarse-grained annotated data to expand the macro-activities into their constituent micro-activities without any external supervision. In the backend, AmicroN uses \textit{unsupervised} change-point detection to search for the micro-activity boundaries across a complex ADL. Then, it applies a \textit{generalized zero-shot} approach to characterize it. We evaluate AmicroN on two real-life publicly available datasets and observe that AmicroN can identify the micro-activities with micro F\textsubscript{1}-score for both datasets. Additionally, we also perform an initial proof-of-concept on leveraging the state-of-the-art (SOTA) large language models (LLMs) with attribute embeddings predicted by AmicroN to enhance further the explainability surrounding the detection of micro-activities.
Paper Structure (41 sections, 7 figures, 4 tables)

This paper contains 41 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: (a) Conventional bottom-up approach of using granular micro-activity labeled data to identify the macro-activity label vs (b) our proposed top-down approach, named AmicroN, which decomposes a coarse-grain macro-activity labeled data and identifies the hidden fine-grained micro-activities.
  • Figure 2: Issues with coarse-grain labeling in the HS dataset -- Changes in (a) furniture sensor for the activity class and (b) raw actimetry data for the activity class "Dressing/Undressing".
  • Figure 3: Issues of subjectivity in the annotations of the kitchen dataset -- Variation of the accelerometer signatures for the subjects (b) S11 and (c) S06 for the activity label "pour big bowl into baking pan."
  • Figure 4: (a) High-level view of AmicroN; (b) The zero-shot model $\mathbb{Z}$ during training phase. Here, $d$ and $\mathbb{N}$ represent the dimension of sensor data after dimensionality reduction and the dimension of the semantic representations, respectively.
  • Figure 5: (a) F1-score of micro-activity prediction for the kitchen dataset and Top-5 closest verbs (right) in embedding space for the ground-truth micro-activities (left) (b) S06: "pour big bowl into baking pan" and (c) S02: "pour oil into measuring cup small"
  • ...and 2 more figures