Realistic Synthetic Household Data Generation at Scale
Siddharth Singh, Ifrah Idrees, Abraham Dauhajre
TL;DR
This work tackles the challenge of generating scalable, realistic synthetic household data by coupling environment generation with long-term human activity and HRI data through a bidirectional, iterative framework. It combines persona-driven environment schematics, temporally coherent activity generation, and a feedback controller to ensure semantic grounding across modalities, enabling sim-to-real validation and large-scale dataset creation. Statistical analyses using multi-modal embeddings and mutual information demonstrate strong semantic alignment and effective mediation between personas, environments, and behaviors, with real-world alignment showing good correspondence to HOMER data. The framework offers practical utility for developing and testing household-enabled intelligent devices, balancing scalability with semantic fidelity, and laying groundwork for more robust embodied AI data ecosystems.
Abstract
Advancements in foundation models have catalyzed research in Embodied AI to develop interactive agents capable of environmental reasoning and interaction. Developing such agents requires diverse, large-scale datasets. Prior frameworks generate synthetic data for long-term human-robot interactions but fail to model the bidirectional influence between human behavior and household environments. Our proposed generative framework creates household datasets at scale through loosely coupled generation of long-term human-robot interactions and environments. Human personas influence environment generation, while environment schematics and semantics shape human-robot interactions. The generated 3D data includes rich static context such as object and environment semantics, and temporal context capturing human and agent behaviors over extended periods. Our flexible tool allows users to define dataset characteristics via natural language prompts, enabling configuration of environment and human activity data through natural language specifications. The tool creates variations of user-defined configurations, enabling scalable data generation. We validate our framework through statistical evaluation using multi-modal embeddings and key metrics: cosine similarity, mutual information gain, intervention analysis, and iterative improvement validation. Statistical comparisons show good alignment with real-world datasets (HOMER) with cosine similarity (0.60), while synthetic datasets (Wang et al.) show moderate alignment (0.27). Intervention analysis across age, organization, and sleep pattern changes shows statistically significant effects (p < 0.001) with large effect sizes (Cohen's d = 0.51-1.12), confirming bidirectional coupling translates persona traits into measurable environmental and behavioral differences. These contributions enable development and testing of household smart devices at scale.
