Decoupled Generative Modeling for Human-Object Interaction Synthesis

Hwanhee Jung; Seunggwan Lee; Jeongyoon Yoon; SeungHyeon Kim; Giljoo Nam; Qixing Huang; Sangpil Kim

Decoupled Generative Modeling for Human-Object Interaction Synthesis

Hwanhee Jung, Seunggwan Lee, Jeongyoon Yoon, SeungHyeon Kim, Giljoo Nam, Qixing Huang, Sangpil Kim

TL;DR

DecHOI tackles realistic human-object interaction synthesis by decoupling path planning from action generation, enabling waypoint-free trajectory generation and detailed motion conditioned on learned paths. It introduces a diffusion-based trajectory generator and a separate action generator, augmented with a distal-joint adversarial discriminator and a dynamic planner (DynaPlan) for long-horizon, scene-aware planning in dynamic environments. Across FullBodyManipulation and unseen 3D-FUTURE objects, it achieves state-of-the-art quantitative and qualitative results and is favorably viewed in user studies for text alignment and interaction realism. The approach reduces optimization complexity, improves contact realism, and supports reactive planning in multi-agent scenarios, advancing HOI synthesis for practical 3D vision and robotics applications.

Abstract

Synthesizing realistic human-object interaction (HOI) is essential for 3D computer vision and robotics, underpinning animation and embodied control. Existing approaches often require manually specified intermediate waypoints and place all optimization objectives on a single network, which increases complexity, reduces flexibility, and leads to errors such as unsynchronized human and object motion or penetration. To address these issues, we propose Decoupled Generative Modeling for Human-Object Interaction Synthesis (DecHOI), which separates path planning and action synthesis. A trajectory generator first produces human and object trajectories without prescribed waypoints, and an action generator conditions on these paths to synthesize detailed motions. To further improve contact realism, we employ adversarial training with a discriminator that focuses on the dynamics of distal joints. The framework also models a moving counterpart and supports responsive, long-sequence planning in dynamic scenes, while preserving plan consistency. Across two benchmarks, FullBodyManipulation and 3D-FUTURE, DecHOI surpasses prior methods on most quantitative metrics and qualitative evaluations, and perceptual studies likewise prefer our results.

Decoupled Generative Modeling for Human-Object Interaction Synthesis

TL;DR

Abstract

Decoupled Generative Modeling for Human-Object Interaction Synthesis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)