HOI-Dyn: Learning Interaction Dynamics for Human-Object Motion Diffusion
Lin Wu, Zhixiang Chen, Jianglin Lan
TL;DR
This work reframes human–object interaction (HOI) generation as a driver–responder problem, where human actions drive object responses. It introduces HOI-Dyn, a lightweight transformer-based interaction dynamics model coupled with a residual-based dynamics loss to enforce causal object reactions during training, while keeping inference efficient. A conditional diffusion backbone jointly models human, object, and interaction context, with an auxiliary dynamics loss and horizon extension to capture varying interaction magnitudes. Experiments on FullBodyManipulation and 3D-FUTURE show state-of-the-art performance across multiple metrics, plus compelling 3D scene applications and a dynamics-based metric for causal evaluation. The approach demonstrates improved physical plausibility, temporal coherence, and contact realism, with practical implications for VR/AR, animation, and robotics, while outlining avenues for richer object representations and multi-agent scalability.
Abstract
Generating realistic 3D human-object interactions (HOIs) remains a challenging task due to the difficulty of modeling detailed interaction dynamics. Existing methods treat human and object motions independently, resulting in physically implausible and causally inconsistent behaviors. In this work, we present HOI-Dyn, a novel framework that formulates HOI generation as a driver-responder system, where human actions drive object responses. At the core of our method is a lightweight transformer-based interaction dynamics model that explicitly predicts how objects should react to human motion. To further enforce consistency, we introduce a residual-based dynamics loss that mitigates the impact of dynamics prediction errors and prevents misleading optimization signals. The dynamics model is used only during training, preserving inference efficiency. Through extensive qualitative and quantitative experiments, we demonstrate that our approach not only enhances the quality of HOI generation but also establishes a feasible metric for evaluating the quality of generated interactions.
