Table of Contents
Fetching ...

ActivityNarrated: An Open-Ended Narrative Paradigm for Wearable Human Activity Understanding

Lala Shakti Swarup Ray, Mengxi Liu, Alcina Pinto, Deepika Gurung, Daniel Geissler, Paul Lukowoicz, Bo Zhou

Abstract

Wearable HAR has improved steadily, but most progress still relies on closed-set classification, which limits real-world use. In practice, human activity is open-ended, unscripted, personalized, and often compositional, unfolding as narratives rather than instances of fixed classes. We argue that addressing this gap does not require simply scaling datasets or models. It requires a fundamental shift in how wearable HAR is formulated, supervised, and evaluated. This work shows how to model open-ended activity narratives by aligning wearable sensor data with natural-language descriptions in an open-vocabulary setting. Our framework has three core components. First, we introduce a naturalistic data collection and annotation pipeline that combines multi-position wearable sensing with free-form, time-aligned narrative descriptions of ongoing behavior, allowing activity semantics to emerge without a predefined vocabulary. Second, we define a retrieval-based evaluation framework that measures semantic alignment between sensor data and language, enabling principled evaluation without fixed classes while also subsuming closed-set classification as a special case. Third, we present a language-conditioned learning architecture that supports sensor-to-text inference over variable-length sensor streams and heterogeneous sensor placements. Experiments show that models trained with fixed-label objectives degrade sharply under real-world variability, while open-vocabulary sensor-language alignment yields robust and semantically grounded representations. Once this alignment is learned, closed-set activity recognition becomes a simple downstream task. Under cross-participant evaluation, our method achieves 65.3% Macro-F1, compared with 31-34% for strong closed-set HAR baselines. These results establish open-ended narrative modeling as a practical and effective foundation for real-world wearable HAR.

ActivityNarrated: An Open-Ended Narrative Paradigm for Wearable Human Activity Understanding

Abstract

Wearable HAR has improved steadily, but most progress still relies on closed-set classification, which limits real-world use. In practice, human activity is open-ended, unscripted, personalized, and often compositional, unfolding as narratives rather than instances of fixed classes. We argue that addressing this gap does not require simply scaling datasets or models. It requires a fundamental shift in how wearable HAR is formulated, supervised, and evaluated. This work shows how to model open-ended activity narratives by aligning wearable sensor data with natural-language descriptions in an open-vocabulary setting. Our framework has three core components. First, we introduce a naturalistic data collection and annotation pipeline that combines multi-position wearable sensing with free-form, time-aligned narrative descriptions of ongoing behavior, allowing activity semantics to emerge without a predefined vocabulary. Second, we define a retrieval-based evaluation framework that measures semantic alignment between sensor data and language, enabling principled evaluation without fixed classes while also subsuming closed-set classification as a special case. Third, we present a language-conditioned learning architecture that supports sensor-to-text inference over variable-length sensor streams and heterogeneous sensor placements. Experiments show that models trained with fixed-label objectives degrade sharply under real-world variability, while open-vocabulary sensor-language alignment yields robust and semantically grounded representations. Once this alignment is learned, closed-set activity recognition becomes a simple downstream task. Under cross-participant evaluation, our method achieves 65.3% Macro-F1, compared with 31-34% for strong closed-set HAR baselines. These results establish open-ended narrative modeling as a practical and effective foundation for real-world wearable HAR.

Paper Structure

This paper contains 95 sections, 10 equations, 8 figures, 12 tables.

Figures (8)

  • Figure 1: Conceptual comparison between prevailing wearable HAR pipelines and the open-vocabulary paradigm introduced in this work. Prior approaches rely on fixed-size windows, standardized sensor layouts, and closed-set classification, leading to brittle performance under sensor shift, missing sensors, and long-tailed behaviors. In contrast, the proposed open-vocabulary paradigm models activities as open-ended narratives by grounding variable-length, multi-position sensor streams in a language embedding space, subsuming closed-set HAR classification as a special narrow scope downstream task.
  • Figure 2: Room-scale layout with seven activity hotspots and two fixed camera viewpoints covering the full environment.
  • Figure 3: IMU data collection device design. IMU on body Layout
  • Figure 4: Spectral VQ-VAE for IMU tokenization. IMU streams are split into chunks and processed in three views (time, STFT, wavelet). Three encoders produce latent embeddings that are fused and quantized via a shared codebook. The decoder reconstructs the time-domain signal and optionally spectral targets.
  • Figure 5: Sensor-to-LLM generation. Token sequences from an arbitrary subset of IMU positions are embedded and fused via a Q-Former into fixed-size sensor prompts. A textual header specifies positions and duration. A frozen LLM generates open-vocabulary activity descriptions conditioned on both prompts and header.
  • ...and 3 more figures