ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions
Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na, Hanbyul Joo
TL;DR
ParaHome presents a comprehensive 3D HOI capture system for natural home activities, integrating 70-camera multi-view rigs with IMU-based body suits and hand gloves to jointly track human bodies, dexterous hands, and articulated objects. The authors formalize a parameterized 3D HOI space and demonstrate robust object articulation modeling, body–object alignment, and post-processing enhancements, culminating in a new dataset of 38 participants, 22 objects, and 486 minutes with rich text annotations. They further explore generative modeling tasks—text-conditioned motion synthesis and object-guided body motion synthesis—using diffusion-based and transformer-based approaches, showing that mixed data from ParaHome and existing datasets yields competitive quality and enables interpolations between styles. Together, these contributions advance naturalistic HOI understanding and open avenues for text- and object-driven 3D motion synthesis in everyday environments.
Abstract
To enable machines to understand the way humans interact with the physical world in daily life, 3D interaction signals should be captured in natural settings, allowing people to engage with multiple objects in a range of sequential and casual manipulations. To achieve this goal, we introduce our ParaHome system designed to capture dynamic 3D movements of humans and objects within a common home environment. Our system features a multi-view setup with 70 synchronized RGB cameras, along with wearable motion capture devices including an IMU-based body suit and hand motion capture gloves. By leveraging the ParaHome system, we collect a new human-object interaction dataset, including 486 minutes of sequences across 207 captures with 38 participants, offering advancements with three key aspects: (1) capturing body motion and dexterous hand manipulation motion alongside multiple objects within a contextual home environment; (2) encompassing sequential and concurrent manipulations paired with text descriptions; and (3) including articulated objects with multiple parts represented by 3D parameterized models. We present detailed design justifications for our system, and perform key generative modeling experiments to demonstrate the potential of our dataset.
