Table of Contents
Fetching ...

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement

Yun Liu, Chengwen Zhang, Ruofan Xing, Bingda Tang, Bowen Yang, Li Yi

TL;DR

CORE4D tackles the scarcity of large-scale, high-fidelity 4D human-object-human interaction data for collaborative object rearrangement by combining 1K real HOH sequences with 10K synthetic retargeted sequences across 3K object shapes. It introduces a hybrid data-collection pipeline that preserves temporal collaboration patterns while expanding spatial relations through collaboration retargeting, including object-centric DeepSDF-based retargeting and a human-centric pose discriminator for selection. The work benchmarks two tasks—motion forecasting and interaction synthesis—demonstrating the challenges of modeling multi-person collaboration and showing that synthetic data can enhance forecasting performance and enable humanoid skill learning. By providing diverse object geometries, collaboration modes, and 3D scenes, CORE4D offers a practical resource for VR/AR, human-robot interaction, and humanoid manipulation research, while acknowledging limitations like the absence of outdoor scenes and visual data in the synthetic branch.

Abstract

Understanding how humans cooperatively rearrange household objects is critical for VR/AR and human-robot interaction. However, in-depth studies on modeling these behaviors are under-researched due to the lack of relevant datasets. We fill this gap by presenting CORE4D, a novel large-scale 4D human-object-human interaction dataset focusing on collaborative object rearrangement, which encompasses diverse compositions of various object geometries, collaboration modes, and 3D scenes. With 1K human-object-human motion sequences captured in the real world, we enrich CORE4D by contributing an iterative collaboration retargeting strategy to augment motions to a variety of novel objects. Leveraging this approach, CORE4D comprises a total of 11K collaboration sequences spanning 3K real and virtual object shapes. Benefiting from extensive motion patterns provided by CORE4D, we benchmark two tasks aiming at generating human-object interaction: human-object motion forecasting and interaction synthesis. Extensive experiments demonstrate the effectiveness of our collaboration retargeting strategy and indicate that CORE4D has posed new challenges to existing human-object interaction generation methodologies.

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement

TL;DR

CORE4D tackles the scarcity of large-scale, high-fidelity 4D human-object-human interaction data for collaborative object rearrangement by combining 1K real HOH sequences with 10K synthetic retargeted sequences across 3K object shapes. It introduces a hybrid data-collection pipeline that preserves temporal collaboration patterns while expanding spatial relations through collaboration retargeting, including object-centric DeepSDF-based retargeting and a human-centric pose discriminator for selection. The work benchmarks two tasks—motion forecasting and interaction synthesis—demonstrating the challenges of modeling multi-person collaboration and showing that synthetic data can enhance forecasting performance and enable humanoid skill learning. By providing diverse object geometries, collaboration modes, and 3D scenes, CORE4D offers a practical resource for VR/AR, human-robot interaction, and humanoid manipulation research, while acknowledging limitations like the absence of outdoor scenes and visual data in the synthetic branch.

Abstract

Understanding how humans cooperatively rearrange household objects is critical for VR/AR and human-robot interaction. However, in-depth studies on modeling these behaviors are under-researched due to the lack of relevant datasets. We fill this gap by presenting CORE4D, a novel large-scale 4D human-object-human interaction dataset focusing on collaborative object rearrangement, which encompasses diverse compositions of various object geometries, collaboration modes, and 3D scenes. With 1K human-object-human motion sequences captured in the real world, we enrich CORE4D by contributing an iterative collaboration retargeting strategy to augment motions to a variety of novel objects. Leveraging this approach, CORE4D comprises a total of 11K collaboration sequences spanning 3K real and virtual object shapes. Benefiting from extensive motion patterns provided by CORE4D, we benchmark two tasks aiming at generating human-object interaction: human-object motion forecasting and interaction synthesis. Extensive experiments demonstrate the effectiveness of our collaboration retargeting strategy and indicate that CORE4D has posed new challenges to existing human-object interaction generation methodologies.
Paper Structure (51 sections, 22 equations, 12 figures, 9 tables)

This paper contains 51 sections, 22 equations, 12 figures, 9 tables.

Figures (12)

  • Figure 1: CORE4D-Real data capturing system. (a) demonstrates the wearing of mocap suits and the positioning of the egocentric camera. (b) shows an object with four markers. (c) illustrates the data capturing system and camera views.
  • Figure 2: CORE4D-Real data modality overview.
  • Figure 3: Collaboration retargeting pipeline. We propose a collaboration retargeting algorithm by iteratively refining interaction motion. The input is a source-target pair. First, we sample contact candidates from whole CORE4D-Real contact knowledge on source. For each contact candidate, we apply contact retargeting to propagate contact candidates to contact constraints on target. Sampled motion from CORE4D-Real provides a high-level collaboration pattern, together with low-level contact constraints, we obtain interaction candidates from interaction retargeting. Then, the human pose discriminator selects the optimal candidates, prompting a contact constraints update via beam search. After multiple iterations, the process yields augmented interactions. This iterative mechanism can effectively get a reasonable one from numerous contact constraints and ensures a refined interaction, enhancing the dataset's applicability across various scenarios.
  • Figure 4: Dataset statistics. (a) shows object samples from six categories. Bars in (b) indicate when the person is in contact with the object during the entire collaborative object rearrangement interaction process. (c) presents the proportion of collaboration modes in the dataset.
  • Figure 5: Visualization of the humanoid box-lifting skill trained by CORE4D via imitation learning.
  • ...and 7 more figures