Table of Contents
Fetching ...

Realistic Synthetic Household Data Generation at Scale

Siddharth Singh, Ifrah Idrees, Abraham Dauhajre

TL;DR

This work tackles the challenge of generating scalable, realistic synthetic household data by coupling environment generation with long-term human activity and HRI data through a bidirectional, iterative framework. It combines persona-driven environment schematics, temporally coherent activity generation, and a feedback controller to ensure semantic grounding across modalities, enabling sim-to-real validation and large-scale dataset creation. Statistical analyses using multi-modal embeddings and mutual information demonstrate strong semantic alignment and effective mediation between personas, environments, and behaviors, with real-world alignment showing good correspondence to HOMER data. The framework offers practical utility for developing and testing household-enabled intelligent devices, balancing scalability with semantic fidelity, and laying groundwork for more robust embodied AI data ecosystems.

Abstract

Advancements in foundation models have catalyzed research in Embodied AI to develop interactive agents capable of environmental reasoning and interaction. Developing such agents requires diverse, large-scale datasets. Prior frameworks generate synthetic data for long-term human-robot interactions but fail to model the bidirectional influence between human behavior and household environments. Our proposed generative framework creates household datasets at scale through loosely coupled generation of long-term human-robot interactions and environments. Human personas influence environment generation, while environment schematics and semantics shape human-robot interactions. The generated 3D data includes rich static context such as object and environment semantics, and temporal context capturing human and agent behaviors over extended periods. Our flexible tool allows users to define dataset characteristics via natural language prompts, enabling configuration of environment and human activity data through natural language specifications. The tool creates variations of user-defined configurations, enabling scalable data generation. We validate our framework through statistical evaluation using multi-modal embeddings and key metrics: cosine similarity, mutual information gain, intervention analysis, and iterative improvement validation. Statistical comparisons show good alignment with real-world datasets (HOMER) with cosine similarity (0.60), while synthetic datasets (Wang et al.) show moderate alignment (0.27). Intervention analysis across age, organization, and sleep pattern changes shows statistically significant effects (p < 0.001) with large effect sizes (Cohen's d = 0.51-1.12), confirming bidirectional coupling translates persona traits into measurable environmental and behavioral differences. These contributions enable development and testing of household smart devices at scale.

Realistic Synthetic Household Data Generation at Scale

TL;DR

This work tackles the challenge of generating scalable, realistic synthetic household data by coupling environment generation with long-term human activity and HRI data through a bidirectional, iterative framework. It combines persona-driven environment schematics, temporally coherent activity generation, and a feedback controller to ensure semantic grounding across modalities, enabling sim-to-real validation and large-scale dataset creation. Statistical analyses using multi-modal embeddings and mutual information demonstrate strong semantic alignment and effective mediation between personas, environments, and behaviors, with real-world alignment showing good correspondence to HOMER data. The framework offers practical utility for developing and testing household-enabled intelligent devices, balancing scalability with semantic fidelity, and laying groundwork for more robust embodied AI data ecosystems.

Abstract

Advancements in foundation models have catalyzed research in Embodied AI to develop interactive agents capable of environmental reasoning and interaction. Developing such agents requires diverse, large-scale datasets. Prior frameworks generate synthetic data for long-term human-robot interactions but fail to model the bidirectional influence between human behavior and household environments. Our proposed generative framework creates household datasets at scale through loosely coupled generation of long-term human-robot interactions and environments. Human personas influence environment generation, while environment schematics and semantics shape human-robot interactions. The generated 3D data includes rich static context such as object and environment semantics, and temporal context capturing human and agent behaviors over extended periods. Our flexible tool allows users to define dataset characteristics via natural language prompts, enabling configuration of environment and human activity data through natural language specifications. The tool creates variations of user-defined configurations, enabling scalable data generation. We validate our framework through statistical evaluation using multi-modal embeddings and key metrics: cosine similarity, mutual information gain, intervention analysis, and iterative improvement validation. Statistical comparisons show good alignment with real-world datasets (HOMER) with cosine similarity (0.60), while synthetic datasets (Wang et al.) show moderate alignment (0.27). Intervention analysis across age, organization, and sleep pattern changes shows statistically significant effects (p < 0.001) with large effect sizes (Cohen's d = 0.51-1.12), confirming bidirectional coupling translates persona traits into measurable environmental and behavioral differences. These contributions enable development and testing of household smart devices at scale.
Paper Structure (20 sections, 1 equation, 4 figures, 7 tables, 3 algorithms)

This paper contains 20 sections, 1 equation, 4 figures, 7 tables, 3 algorithms.

Figures (4)

  • Figure 1: Framework Pipeline Overview: Our bidirectional generation framework comprises three primary modules operating in an iterative refinement cycle. The Environment Schematic Generator produces 3D household layouts based on persona-driven requirements. The Human Activity and HRI Generator synthesizes temporally consistent behavior sequences. The Bidirectional Influence Controller orchestrates iterative information exchange between the modules.
  • Figure 2: Input Specification and Contextual Memory Framework: Our system accepts structured natural language descriptions of household member personas and environmental constraints. The framework maintains contextual memory across the pipeline, providing the LLM with context regarding task requirements and completed steps to reduce hallucination issues.
  • Figure 3: Human Activity and Interaction Generation Pipeline
  • Figure 4: Intervention Analysis t-SNE Cluster Visualizations: Clear cluster separation validates the framework's ability to generate distinct persona-driven patterns across different intervention conditions.