PhysReaction: Physically Plausible Real-Time Humanoid Reaction Synthesis via Forward Dynamics Guided 4D Imitation
Yunze Liu, Changxi Chen, Chenjing Ding, Li Yi
TL;DR
This work addresses the challenge of real-time, physically plausible humanoid reaction synthesis in multi-human settings, where prior kinematics-based methods produce nonphysical artifacts and diffusion-based approaches struggle with online inference. It introduces Forward Dynamics Guided 4D Imitation, which leverages a Forward Dynamics Model (FDM) to guide time-aware imitation learning, enabling online reactions at $30$ fps and improved physical plausibility. The approach comprises Demonstration Generation from motion capture via a universal motion tracker, learning a stochastic FDM with state/action VAEs and a contrastive loss, and an Iterative Generalist-Specialist Learning strategy to train a universal reactor policy. On InterHuman and Chi3D, the method outperforms baselines, exhibits robustness to motion-capture noise, and remains effective with limited training data, supporting real-time deployment on common GPUs.
Abstract
Humanoid Reaction Synthesis is pivotal for creating highly interactive and empathetic robots that can seamlessly integrate into human environments, enhancing the way we live, work, and communicate. However, it is difficult to learn the diverse interaction patterns of multiple humans and generate physically plausible reactions. The kinematics-based approaches face challenges, including issues like floating feet, sliding, penetration, and other problems that defy physical plausibility. The existing physics-based method often relies on kinematics-based methods to generate reference states, which struggle with the challenges posed by kinematic noise during action execution. Constrained by their reliance on diffusion models, these methods are unable to achieve real-time inference. In this work, we propose a Forward Dynamics Guided 4D Imitation method to generate physically plausible human-like reactions. The learned policy is capable of generating physically plausible and human-like reactions in real-time, significantly improving the speed(x33) and quality of reactions compared with the existing method. Our experiments on the InterHuman and Chi3D datasets, along with ablation studies, demonstrate the effectiveness of our approach.
