Synthetic Multimodal Dataset for Empowering Safety and Well-being in Home Environments
Takanori Ugai, Shusaku Egami, Swe Nwe Nwe Htun, Kouji Kozaki, Takahiro Kawamura, Ken Fukuda
TL;DR
The paper presents a synthetic multimodal resource for safety and well-being in home environments by combining simulated videos with Event-Centric knowledge graphs, enabling joint reasoning over visual and symbolic data. It crowds a large-scale dataset (203 scenarios, 1,218 videos) with RDF KGs totaling nearly 3 million triples and a SPARQL endpoint, supported by tools like VirtualHome-AIST and VirtualHome2KG that scale actions and encode dynamic states. A comprehensive embedding suite (TransE, ComplEx, RotatE, and jRDF2Vec) accompanies the KG data to support ML-based reasoning, while visualization and scenario-editor tools streamline data creation. The KGRC4SI framework provides structured tasks and evaluation criteria to push multimodal reasoning for hazard detection in homes, with a pilot demonstrating diverse techniques from rule-based systems to LLM-assisted explanations and script generation. The work contributes a reusable, semantically rich platform that can extend to other domains and promotes grounding of language models in concrete, world-aligned data for safety applications.
Abstract
This paper presents a synthetic multimodal dataset of daily activities that fuses video data from a 3D virtual space simulator with knowledge graphs depicting the spatiotemporal context of the activities. The dataset is developed for the Knowledge Graph Reasoning Challenge for Social Issues (KGRC4SI), which focuses on identifying and addressing hazardous situations in the home environment. The dataset is available to the public as a valuable resource for researchers and practitioners developing innovative solutions recognizing human behaviors to enhance safety and well-being in
