Table of Contents
Fetching ...

AgentSense: Virtual Sensor Data Generation Using LLM Agents in Simulated Home Environments

Zikang Leng, Megha Thukral, Yaqi Liu, Hrudhai Rajasekhar, Shruthi K. Hiremath, Jiaman He, Thomas Plötz

TL;DR

AgentSense tackles HAR data scarcity by simulating diverse human-life routines in virtual smart homes guided by LLMs. The approach generates rich, privacy-preserving ambient sensor data via an augmented VirtualHome simulator (X-VirtualHome) and multi-level prompting to produce executable actions. Pretraining HAR classifiers on virtual data improves performance across five real datasets, particularly in low-data regimes, and with minimal real data can match fully real-data baselines. The work demonstrates a scalable, privacy-preserving data-generation paradigm for HAR and highlights directions for personalization and multi-modal sensor synthesis.

Abstract

A major challenge in developing robust and generalizable Human Activity Recognition (HAR) systems for smart homes is the lack of large and diverse labeled datasets. Variations in home layouts, sensor configurations, and individual behaviors further exacerbate this issue. To address this, we leverage the idea of embodied AI agents -- virtual agents that perceive and act within simulated environments guided by internal world models. We introduce AgentSense, a virtual data generation pipeline in which agents live out daily routines in simulated smart homes, with behavior guided by Large Language Models (LLMs). The LLM generates diverse synthetic personas and realistic routines grounded in the environment, which are then decomposed into fine-grained actions. These actions are executed in an extended version of the VirtualHome simulator, which we augment with virtual ambient sensors that record the agents' activities. Our approach produces rich, privacy-preserving sensor data that reflects real-world diversity. We evaluate AgentSense on five real HAR datasets. Models pretrained on the generated data consistently outperform baselines, especially in low-resource settings. Furthermore, combining the generated virtual sensor data with a small amount of real data achieves performance comparable to training on full real-world datasets. These results highlight the potential of using LLM-guided embodied agents for scalable and cost-effective sensor data generation in HAR. Our code is publicly available at https://github.com/ZikangLeng/AgentSense.

AgentSense: Virtual Sensor Data Generation Using LLM Agents in Simulated Home Environments

TL;DR

AgentSense tackles HAR data scarcity by simulating diverse human-life routines in virtual smart homes guided by LLMs. The approach generates rich, privacy-preserving ambient sensor data via an augmented VirtualHome simulator (X-VirtualHome) and multi-level prompting to produce executable actions. Pretraining HAR classifiers on virtual data improves performance across five real datasets, particularly in low-data regimes, and with minimal real data can match fully real-data baselines. The work demonstrates a scalable, privacy-preserving data-generation paradigm for HAR and highlights directions for personalization and multi-modal sensor synthesis.

Abstract

A major challenge in developing robust and generalizable Human Activity Recognition (HAR) systems for smart homes is the lack of large and diverse labeled datasets. Variations in home layouts, sensor configurations, and individual behaviors further exacerbate this issue. To address this, we leverage the idea of embodied AI agents -- virtual agents that perceive and act within simulated environments guided by internal world models. We introduce AgentSense, a virtual data generation pipeline in which agents live out daily routines in simulated smart homes, with behavior guided by Large Language Models (LLMs). The LLM generates diverse synthetic personas and realistic routines grounded in the environment, which are then decomposed into fine-grained actions. These actions are executed in an extended version of the VirtualHome simulator, which we augment with virtual ambient sensors that record the agents' activities. Our approach produces rich, privacy-preserving sensor data that reflects real-world diversity. We evaluate AgentSense on five real HAR datasets. Models pretrained on the generated data consistently outperform baselines, especially in low-resource settings. Furthermore, combining the generated virtual sensor data with a small amount of real data achieves performance comparable to training on full real-world datasets. These results highlight the potential of using LLM-guided embodied agents for scalable and cost-effective sensor data generation in HAR. Our code is publicly available at https://github.com/ZikangLeng/AgentSense.

Paper Structure

This paper contains 42 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overview of the framework. The LLM first generates diverse synthetic personas. For each persona, it then produces daily routines grounded in the context of a simulated environment. These routines are decomposed into fine-grained actions, which are executed in the X-VirtualHome simulator. The simulator, augmented with ambient sensors, captures virtual sensor data as the agent enacts its daily life.
  • Figure 2: TDOST Basic model performance when different amount of real data are used for training. The amount of virtual data stays the same.
  • Figure 3: Comparison between a VirtualHome environment and the Milan dataset. Virtual Home layout image from virtualhome2024 and Milan Layout taken from cook2012casas