Table of Contents
Fetching ...

Layout Agnostic Human Activity Recognition in Smart Homes through Textual Descriptions Of Sensor Triggers (TDOST)

Megha Thukral, Sourish Gunesh Dhekane, Shruthi K. Hiremath, Harish Haresamudram, Thomas Ploetz

TL;DR

This work tackles the challenge of deploying HAR systems across smart homes with varying floor plans and sensor layouts by introducing TDOST, a framework that converts raw sensor triggers into contextual textual descriptions. By leveraging pre-trained language-model embeddings and a frozen inference pipeline, TDOST enables layout-agnostic transfer of activity recognizers from a labeled source home to unseen target homes without collecting target data. The paper provides a systematic comparison of TDOST variants (Basic, Temporal, LLM, and LLM+Temporal) and demonstrates substantial cross-dataset gains on CASAS benchmarks, with Sentence-T5 encoders and ConvBi-LSTM classifiers delivering the strongest transfers. It also discusses explainability, maintenance for life-long deployments, and potential extensions to multi-source training and few-shot activity transfer, highlighting practical implications for scalable smart-home HAR deployment.

Abstract

Human activity recognition (HAR) using ambient sensors in smart homes has numerous applications for human healthcare and wellness. However, building general-purpose HAR models that can be deployed to new smart home environments requires a significant amount of annotated sensor data and training overhead. Most smart homes vary significantly in their layouts, i.e., floor plans and the specifics of sensors embedded, resulting in low generalizability of HAR models trained for specific homes. We address this limitation by introducing a novel, layout-agnostic modeling approach for HAR systems in smart homes that utilizes the transferrable representational capacity of natural language descriptions of raw sensor data. To this end, we generate Textual Descriptions Of Sensor Triggers (TDOST) that encapsulate the surrounding trigger conditions and provide cues for underlying activities to the activity recognition models. Leveraging textual embeddings, rather than raw sensor data, we create activity recognition systems that predict standard activities across homes without either (re-)training or adaptation on target homes. Through an extensive evaluation, we demonstrate the effectiveness of TDOST-based models in unseen smart homes through experiments on benchmarked CASAS datasets. Furthermore, we conduct a detailed analysis of how the individual components of our approach affect downstream activity recognition performance.

Layout Agnostic Human Activity Recognition in Smart Homes through Textual Descriptions Of Sensor Triggers (TDOST)

TL;DR

This work tackles the challenge of deploying HAR systems across smart homes with varying floor plans and sensor layouts by introducing TDOST, a framework that converts raw sensor triggers into contextual textual descriptions. By leveraging pre-trained language-model embeddings and a frozen inference pipeline, TDOST enables layout-agnostic transfer of activity recognizers from a labeled source home to unseen target homes without collecting target data. The paper provides a systematic comparison of TDOST variants (Basic, Temporal, LLM, and LLM+Temporal) and demonstrates substantial cross-dataset gains on CASAS benchmarks, with Sentence-T5 encoders and ConvBi-LSTM classifiers delivering the strongest transfers. It also discusses explainability, maintenance for life-long deployments, and potential extensions to multi-source training and few-shot activity transfer, highlighting practical implications for scalable smart-home HAR deployment.

Abstract

Human activity recognition (HAR) using ambient sensors in smart homes has numerous applications for human healthcare and wellness. However, building general-purpose HAR models that can be deployed to new smart home environments requires a significant amount of annotated sensor data and training overhead. Most smart homes vary significantly in their layouts, i.e., floor plans and the specifics of sensors embedded, resulting in low generalizability of HAR models trained for specific homes. We address this limitation by introducing a novel, layout-agnostic modeling approach for HAR systems in smart homes that utilizes the transferrable representational capacity of natural language descriptions of raw sensor data. To this end, we generate Textual Descriptions Of Sensor Triggers (TDOST) that encapsulate the surrounding trigger conditions and provide cues for underlying activities to the activity recognition models. Leveraging textual embeddings, rather than raw sensor data, we create activity recognition systems that predict standard activities across homes without either (re-)training or adaptation on target homes. Through an extensive evaluation, we demonstrate the effectiveness of TDOST-based models in unseen smart homes through experiments on benchmarked CASAS datasets. Furthermore, we conduct a detailed analysis of how the individual components of our approach affect downstream activity recognition performance.
Paper Structure (38 sections, 6 figures, 8 tables)

This paper contains 38 sections, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Overview of layout-agnostic HAR: Our approach derives HAR models on some given source home that can be deployed in new target homes while being agnostic at the modeling level to changes in overall floor plans. At the heart of our approach lies the construction of textual descriptions of raw sensor triggers--TDOST--encoding readily available context information such as symbolic sensor locations and sensor modalities. Layout-agnostic HAR works in two phases: (i) Supervised training of a HAR model in a source home using labeled activity data (top part), and (ii) Deployment of the trained source model to new target homes (bottom part). Source data is first converted to textual descriptions which are ingested by frozen pre-trained sentence encoders. The source embeddings are further adapted to smart home activity using a feature encoder and classification linear layer. To predict the standard set of activities (common across smart homes) in a new home, the trained source model is deployed "as is" without any adaptation/re-training. During the inference stage in target homes, the test activity sensor data is converted to textual description in the same way as originally done in the source smart home -- thereby utilizing easy to obtain meta-information such as floorplan and sensor specifications. The embeddings for target TDOST, generated from sentence encoders are further transformed using the feature encoder and classifier head trained in source. During inference all models including sentence encoder, feature encoder, and classification head are kept frozen.
  • Figure 2: Comparison between the usage of DistilRoBERTa and Sentence-T5 variants of the Sentence Transformer models in the TDOST pipeline. The Sentence-T5 variant, which is an encoder-decoder transformer, significantly outperforms Distill Roberta across all the source-target settings. Moreover, the improvements are noticeable, especially in more challenging transfer settings involving the Cairo and Kyoto7 datasets.
  • Figure 3: Comparison between the usage of Bi-LSTM and ConvBi-LSTM classifiers in the TDOST pipeline. Using a ConvBi-LSTM classifier, having more parameters in its architecture, significantly outperforms the Bi-LSTM classifier and provides performance improvements in almost all source-target transfer settings. These improvements are noticeable, especially in more challenging transfer settings involving the Cairo and Kyoto7 datasets.
  • Figure 4: Aruba sensor layout and floorplan (taken with permission from hiremath2022bootstrapping and cook2012casas)
  • Figure 5: Milan sensor layout and floorplan (taken with permission from hiremath2022bootstrapping and cook2012casas)
  • ...and 1 more figures