PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction

Yinhuai Wang; Jing Lin; Ailing Zeng; Zhengyi Luo; Jian Zhang; Lei Zhang

PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction

Yinhuai Wang, Jing Lin, Ailing Zeng, Zhengyi Luo, Jian Zhang, Lei Zhang

TL;DR

This work tackles dynamic whole-body human-object interaction imitation in physics-based simulation by introducing PhysHOI, a framework that uses a general-purpose contact graph and a task-agnostic reward to guide imitation without task-specific rewards. It adopts a contact-aware HOI representation and a task-agnostic imitation reward, optimized via PPO in a simulation loop with a 52-part SMPL-X humanoid and objects, aided by an aggregated contact graph. The BallPlay dataset of eight basketball skills provides dynamic HOI data to support learning. Experiments on GRAB and BallPlay demonstrate improved success rates and reduced tracking errors, with the contact graph reward (CGR) shown to be crucial for accurate contact and robust imitation. This work advances general HOI learning for robotics and animation by reducing reliance on hand-crafted rewards and explicitly modeling contact dynamics.

Abstract

Humans interact with objects all the time. Enabling a humanoid to learn human-object interaction (HOI) is a key step for future smart animation and intelligent robotics systems. However, recent progress in physics-based HOI requires carefully designed task-specific rewards, making the system unscalable and labor-intensive. This work focuses on dynamic HOI imitation: teaching humanoid dynamic interaction skills through imitating kinematic HOI demonstrations. It is quite challenging because of the complexity of the interaction between body parts and objects and the lack of dynamic HOI data. To handle the above issues, we present PhysHOI, the first physics-based whole-body HOI imitation approach without task-specific reward designs. Except for the kinematic HOI representations of humans and objects, we introduce the contact graph to model the contact relations between body parts and objects explicitly. A contact graph reward is also designed, which proved to be critical for precise HOI imitation. Based on the key designs, PhysHOI can imitate diverse HOI tasks simply yet effectively without prior knowledge. To make up for the lack of dynamic HOI scenarios in this area, we introduce the BallPlay dataset that contains eight whole-body basketball skills. We validate PhysHOI on diverse HOI tasks, including whole-body grasping and basketball skills.

PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction

TL;DR

Abstract

Paper Structure (34 sections, 13 equations, 13 figures, 5 tables)

This paper contains 34 sections, 13 equations, 13 figures, 5 tables.

Introduction
Related Work
Method
Preliminaries on Reinforcement Learning
Task Definition
Overview of PhysHOI
General-purpose Contact Graph
Contact-Aware HOI Representation
Task-agnostic HOI Imitation Reward
State
Policy and Action
Simulation Setting
The BallPlay Dataset
Experiment
Evaluation on HOI Imitation
...and 19 more sections

Figures (13)

Figure 1: Framework overview: The proposed pipeline of learning HOI skills from HOI demonstrations. We can obtain kinematic HOI data using mocap devices or estimated from monocular videos. Then, we transfer the HOI data into the reference HOI states, which is a contact-aware HOI representation {human motion, object motion, interaction graph (IG), contact graph (CG)} for PhysHOI to learn. PhysHOI: The training process of PhysHOI consists of loops of simulation and optimization. Given the simulated HOI state $\boldsymbol{g}_t$ and reference HOI state $\boldsymbol{h}_{t+1}$, the policy outputs the action $\boldsymbol{a}_t$, then the simulated HOI state will be updated by the physics simulator. For each time step, we calculate the proposed task-agnostic HOI imitation reward, including kinematic rewards and the key CG reward. We train the policy until converges, where it can control simulated humanoids to reproduce the reference HOI skills.
Figure 2: Contact Graph. (a) The nodes of the complete contact graph consist of all the objects and humanoid body parts. Each edge stores a binary contact label. (b) A node in the aggregated contact graph can contain multiple body parts.
Figure 3: Our method controls simulated humanoids to perform various basketball skills. Top-to-bottom: 1) Rebound; 2) Single-hand toss and catch; 3) Back dribbling; 4) Cross-leg dribble. We mark red when the object has contact with the humanoid.
Figure 4: The BallPlay dataset. We show the eight HOI demonstrations of high-dynamic basketball skills. For each skill, the upper rows show the real-life videos, and the lower rows give the estimated whole-body SMPL-X human model and object mesh.
Figure 5: Qualitative results on HOI imitation. *means re-implemented methods. Previous methods that use kinematic-only rewards fail to reproduce the interaction accurately, e.g., the ball falls or the grasp fails. Guided by the contact graph, our method yields successful HOI imitation. We mark the object red when it has contact with the humanoid. We outline in red the frame where the failure begins.
...and 8 more figures

PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction

TL;DR

Abstract

PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction

Authors

TL;DR

Abstract

Table of Contents

Figures (13)