Table of Contents
Fetching ...

ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions

Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na, Hanbyul Joo

TL;DR

ParaHome presents a comprehensive 3D HOI capture system for natural home activities, integrating 70-camera multi-view rigs with IMU-based body suits and hand gloves to jointly track human bodies, dexterous hands, and articulated objects. The authors formalize a parameterized 3D HOI space and demonstrate robust object articulation modeling, body–object alignment, and post-processing enhancements, culminating in a new dataset of 38 participants, 22 objects, and 486 minutes with rich text annotations. They further explore generative modeling tasks—text-conditioned motion synthesis and object-guided body motion synthesis—using diffusion-based and transformer-based approaches, showing that mixed data from ParaHome and existing datasets yields competitive quality and enables interpolations between styles. Together, these contributions advance naturalistic HOI understanding and open avenues for text- and object-driven 3D motion synthesis in everyday environments.

Abstract

To enable machines to understand the way humans interact with the physical world in daily life, 3D interaction signals should be captured in natural settings, allowing people to engage with multiple objects in a range of sequential and casual manipulations. To achieve this goal, we introduce our ParaHome system designed to capture dynamic 3D movements of humans and objects within a common home environment. Our system features a multi-view setup with 70 synchronized RGB cameras, along with wearable motion capture devices including an IMU-based body suit and hand motion capture gloves. By leveraging the ParaHome system, we collect a new human-object interaction dataset, including 486 minutes of sequences across 207 captures with 38 participants, offering advancements with three key aspects: (1) capturing body motion and dexterous hand manipulation motion alongside multiple objects within a contextual home environment; (2) encompassing sequential and concurrent manipulations paired with text descriptions; and (3) including articulated objects with multiple parts represented by 3D parameterized models. We present detailed design justifications for our system, and perform key generative modeling experiments to demonstrate the potential of our dataset.

ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions

TL;DR

ParaHome presents a comprehensive 3D HOI capture system for natural home activities, integrating 70-camera multi-view rigs with IMU-based body suits and hand gloves to jointly track human bodies, dexterous hands, and articulated objects. The authors formalize a parameterized 3D HOI space and demonstrate robust object articulation modeling, body–object alignment, and post-processing enhancements, culminating in a new dataset of 38 participants, 22 objects, and 486 minutes with rich text annotations. They further explore generative modeling tasks—text-conditioned motion synthesis and object-guided body motion synthesis—using diffusion-based and transformer-based approaches, showing that mixed data from ParaHome and existing datasets yields competitive quality and enables interpolations between styles. Together, these contributions advance naturalistic HOI understanding and open avenues for text- and object-driven 3D motion synthesis in everyday environments.

Abstract

To enable machines to understand the way humans interact with the physical world in daily life, 3D interaction signals should be captured in natural settings, allowing people to engage with multiple objects in a range of sequential and casual manipulations. To achieve this goal, we introduce our ParaHome system designed to capture dynamic 3D movements of humans and objects within a common home environment. Our system features a multi-view setup with 70 synchronized RGB cameras, along with wearable motion capture devices including an IMU-based body suit and hand motion capture gloves. By leveraging the ParaHome system, we collect a new human-object interaction dataset, including 486 minutes of sequences across 207 captures with 38 participants, offering advancements with three key aspects: (1) capturing body motion and dexterous hand manipulation motion alongside multiple objects within a contextual home environment; (2) encompassing sequential and concurrent manipulations paired with text descriptions; and (3) including articulated objects with multiple parts represented by 3D parameterized models. We present detailed design justifications for our system, and perform key generative modeling experiments to demonstrate the potential of our dataset.
Paper Structure (32 sections, 9 equations, 18 figures, 9 tables)

This paper contains 32 sections, 9 equations, 18 figures, 9 tables.

Figures (18)

  • Figure 1: Our system captures the detailed 3D movements of the human body, hands, and diverse objects, along with text descriptions.
  • Figure 2: (Center) Reconstructed scene of ParaHome from top view. Pictures adjacent to the rendering were taken from the center of the room, headed towards the corresponding black dots in the scene. (Right) Pictures of RGB camera, IMU based motion capture devices with attached body markers and the 3D marker solution on an articulated object.
  • Figure 3: (Left) Scanned 3D models in ParaHome system. (Right) Articulation state of 3D models. Blue bars show the object-specific parameters $s_{e,i}^j(t)$ for each object part $i$. As $s_{e,i}^j(t)$ changes, corresponding parts of the objects show different articulation states
  • Figure 4: (Left) Before/After Body Calibration, Orange: forward kinematic output, Blue: RGB Triangulated Result (Right) Hand Calibration Protocol and Before/After Calibration Protocol
  • Figure 5: An example of SMPL-X shape parameter fitting. (Left) Projected keypoints, mask and rendered SMPL-X with the optimized shape parameter. (Right) Rendered SDF within $5cm$ to visualize an affordance information using optimized SMPL-X.
  • ...and 13 more figures