Table of Contents
Fetching ...

PhysReaction: Physically Plausible Real-Time Humanoid Reaction Synthesis via Forward Dynamics Guided 4D Imitation

Yunze Liu, Changxi Chen, Chenjing Ding, Li Yi

TL;DR

This work addresses the challenge of real-time, physically plausible humanoid reaction synthesis in multi-human settings, where prior kinematics-based methods produce nonphysical artifacts and diffusion-based approaches struggle with online inference. It introduces Forward Dynamics Guided 4D Imitation, which leverages a Forward Dynamics Model (FDM) to guide time-aware imitation learning, enabling online reactions at $30$ fps and improved physical plausibility. The approach comprises Demonstration Generation from motion capture via a universal motion tracker, learning a stochastic FDM with state/action VAEs and a contrastive loss, and an Iterative Generalist-Specialist Learning strategy to train a universal reactor policy. On InterHuman and Chi3D, the method outperforms baselines, exhibits robustness to motion-capture noise, and remains effective with limited training data, supporting real-time deployment on common GPUs.

Abstract

Humanoid Reaction Synthesis is pivotal for creating highly interactive and empathetic robots that can seamlessly integrate into human environments, enhancing the way we live, work, and communicate. However, it is difficult to learn the diverse interaction patterns of multiple humans and generate physically plausible reactions. The kinematics-based approaches face challenges, including issues like floating feet, sliding, penetration, and other problems that defy physical plausibility. The existing physics-based method often relies on kinematics-based methods to generate reference states, which struggle with the challenges posed by kinematic noise during action execution. Constrained by their reliance on diffusion models, these methods are unable to achieve real-time inference. In this work, we propose a Forward Dynamics Guided 4D Imitation method to generate physically plausible human-like reactions. The learned policy is capable of generating physically plausible and human-like reactions in real-time, significantly improving the speed(x33) and quality of reactions compared with the existing method. Our experiments on the InterHuman and Chi3D datasets, along with ablation studies, demonstrate the effectiveness of our approach.

PhysReaction: Physically Plausible Real-Time Humanoid Reaction Synthesis via Forward Dynamics Guided 4D Imitation

TL;DR

This work addresses the challenge of real-time, physically plausible humanoid reaction synthesis in multi-human settings, where prior kinematics-based methods produce nonphysical artifacts and diffusion-based approaches struggle with online inference. It introduces Forward Dynamics Guided 4D Imitation, which leverages a Forward Dynamics Model (FDM) to guide time-aware imitation learning, enabling online reactions at fps and improved physical plausibility. The approach comprises Demonstration Generation from motion capture via a universal motion tracker, learning a stochastic FDM with state/action VAEs and a contrastive loss, and an Iterative Generalist-Specialist Learning strategy to train a universal reactor policy. On InterHuman and Chi3D, the method outperforms baselines, exhibits robustness to motion-capture noise, and remains effective with limited training data, supporting real-time deployment on common GPUs.

Abstract

Humanoid Reaction Synthesis is pivotal for creating highly interactive and empathetic robots that can seamlessly integrate into human environments, enhancing the way we live, work, and communicate. However, it is difficult to learn the diverse interaction patterns of multiple humans and generate physically plausible reactions. The kinematics-based approaches face challenges, including issues like floating feet, sliding, penetration, and other problems that defy physical plausibility. The existing physics-based method often relies on kinematics-based methods to generate reference states, which struggle with the challenges posed by kinematic noise during action execution. Constrained by their reliance on diffusion models, these methods are unable to achieve real-time inference. In this work, we propose a Forward Dynamics Guided 4D Imitation method to generate physically plausible human-like reactions. The learned policy is capable of generating physically plausible and human-like reactions in real-time, significantly improving the speed(x33) and quality of reactions compared with the existing method. Our experiments on the InterHuman and Chi3D datasets, along with ablation studies, demonstrate the effectiveness of our approach.
Paper Structure (27 sections, 5 equations, 3 figures, 2 tables)

This paper contains 27 sections, 5 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: We introduce the Forward Dynamics Guided 4D Imitation method, a novel approach that employs a neural model to simulate human forward dynamics in real-time at 30 fps(speed up x33). This model guides the process of 4D imitation learning, enabling the generation of reactions that are not only physically plausible but also closely mimic human behavior. More details are available in https://yunzeliu.github.io/PhysReaction/
  • Figure 2: Our method can be divided into four major parts: Demonstration Generation Process, Forward Dynamics Model Training, Iterative Generalist-Specialist Learning Strategy, and Forward Dynamics Guided 4D Imitation Learning.
  • Figure 3: Qualitative results on InterHuman. It can be observed that our method significantly outperforms the InsActor in terms of stability and realism. While it tends to fall over, our approach can generate stable interactions.