ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation

Ling-An Zeng; Guohong Huang; Yi-Lin Wei; Shengbo Gu; Yu-Ming Tang; Jingke Meng; Wei-Shi Zheng

ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation

Ling-An Zeng, Guohong Huang, Yi-Lin Wei, Shengbo Gu, Yu-Ming Tang, Jingke Meng, Wei-Shi Zheng

TL;DR

ChainHOI introduces a dual-level framework for text-driven HOI generation that explicitly models joint-level interactions with a Generative Spatiotemporal Graph Convolution Network (GST-GCN) and kinetic-chain interactions with a Kinematics-based Interaction Module (KIM). By employing a joint graph that includes an object node and a kinetic-chain token mechanism, the model captures both short-/long-term joint relations and inter-joint coordination within biomechanical constraints. Training combines diffusion-based generation with auxiliary losses that penalize penetration and incorrect object motion, yielding semantically coherent and physically plausible HOIs. Evaluations on BEHAVE and OMOMO demonstrate state-of-the-art performance in motion quality and interaction realism, with additional gains from Affordance-guided Interaction Correction (AIC). The approach advances text-driven HOI synthesis by making the physics and geometry of human-object interactions explicit and biomechanically coherent, enabling more controllable and realistic animations for AR/VR, gaming, and film production.

Abstract

We propose ChainHOI, a novel approach for text-driven human-object interaction (HOI) generation that explicitly models interactions at both the joint and kinetic chain levels. Unlike existing methods that implicitly model interactions using full-body poses as tokens, we argue that explicitly modeling joint-level interactions is more natural and effective for generating realistic HOIs, as it directly captures the geometric and semantic relationships between joints, rather than modeling interactions in the latent pose space. To this end, ChainHOI introduces a novel joint graph to capture potential interactions with objects, and a Generative Spatiotemporal Graph Convolution Network to explicitly model interactions at the joint level. Furthermore, we propose a Kinematics-based Interaction Module that explicitly models interactions at the kinetic chain level, ensuring more realistic and biomechanically coherent motions. Evaluations on two public datasets demonstrate that ChainHOI significantly outperforms previous methods, generating more realistic, and semantically consistent HOIs. Code is available \href{https://github.com/qinghuannn/ChainHOI}{here}.

ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation

TL;DR

Abstract

ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)