Table of Contents
Fetching ...

Towards LLM-Powered Ambient Sensor Based Multi-Person Human Activity Recognition

Xi Chen, Julien Cumin, Fano Ramparany, Dominique Vaufreydaz

TL;DR

The paper tackles multi-person ambient sensor HAR (AHAR) by proposing LAHAR, an LLM-powered two-stage framework that converts sensor events into text, separates subjects, and generates action-level descriptions followed by activity-level reasoning to produce per-subject timelines. It leverages prompt engineering, in-context learning, and contextual prompts to address data scarcity, generalization, and explainability, mapping sensor sequences $S_T$ to activity timelines $A_T$ through $e_t=\langle t,s,c\rangle$. Evaluated on the ARAS dataset, LAHAR achieves high-temporal-resolution results comparable to state-of-the-art methods and demonstrates robust multi-subject separation, with qualitative demonstrations of per-subject activity descriptions. The approach highlights the potential of LLMs for explainable, privacy-preserving ambient HAR and suggests directions for broader LLM adoption and enhanced conversational explainability in real-world settings.

Abstract

Human Activity Recognition (HAR) is one of the central problems in fields such as healthcare, elderly care, and security at home. However, traditional HAR approaches face challenges including data scarcity, difficulties in model generalization, and the complexity of recognizing activities in multi-person scenarios. This paper proposes a system framework called LAHAR, based on large language models. Utilizing prompt engineering techniques, LAHAR addresses HAR in multi-person scenarios by enabling subject separation and action-level descriptions of events occurring in the environment. We validated our approach on the ARAS dataset, and the results demonstrate that LAHAR achieves comparable accuracy to the state-of-the-art method at higher resolutions and maintains robustness in multi-person scenarios.

Towards LLM-Powered Ambient Sensor Based Multi-Person Human Activity Recognition

TL;DR

The paper tackles multi-person ambient sensor HAR (AHAR) by proposing LAHAR, an LLM-powered two-stage framework that converts sensor events into text, separates subjects, and generates action-level descriptions followed by activity-level reasoning to produce per-subject timelines. It leverages prompt engineering, in-context learning, and contextual prompts to address data scarcity, generalization, and explainability, mapping sensor sequences to activity timelines through . Evaluated on the ARAS dataset, LAHAR achieves high-temporal-resolution results comparable to state-of-the-art methods and demonstrates robust multi-subject separation, with qualitative demonstrations of per-subject activity descriptions. The approach highlights the potential of LLMs for explainable, privacy-preserving ambient HAR and suggests directions for broader LLM adoption and enhanced conversational explainability in real-world settings.

Abstract

Human Activity Recognition (HAR) is one of the central problems in fields such as healthcare, elderly care, and security at home. However, traditional HAR approaches face challenges including data scarcity, difficulties in model generalization, and the complexity of recognizing activities in multi-person scenarios. This paper proposes a system framework called LAHAR, based on large language models. Utilizing prompt engineering techniques, LAHAR addresses HAR in multi-person scenarios by enabling subject separation and action-level descriptions of events occurring in the environment. We validated our approach on the ARAS dataset, and the results demonstrate that LAHAR achieves comparable accuracy to the state-of-the-art method at higher resolutions and maintains robustness in multi-person scenarios.
Paper Structure (32 sections, 1 equation, 4 figures, 3 tables)

This paper contains 32 sections, 1 equation, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Workflow for our proposed LLM-based AHAR framework: LAHAR.
  • Figure 2: Example of outputs generated by LAHAR at each stage
  • Figure 3: Comparison of confusion matrices between our method and the Gao et al. method gao2024unsupervised. The categories are: X) Unknown, A) Preparing Breakfast, B) Having Breakfast, C) Preparing Lunch, D) Having Lunch, E) Preparing Dinner, F) Having Dinner, G) Sleeping, H) Having Shower, I) Toileting.
  • Figure 4: Confusion matrices of single-user activity recognition.