GraspDiffusion: Synthesizing Realistic Whole-body Hand-Object Interaction
Patrick Kwon, Chen Chen, Hanbyul Joo
TL;DR
GraspDiffusion tackles realistic full-body hand–object interaction generation by first predicting a 3D full-body grasp pose conditioned on an object, then guiding high-quality image synthesis with a scene-generation diffusion that enforces accurate spatial relations and human identity. The method decouples body and hand priors, uses three spatial cues with attention-based conditioning, and leverages a curated pseudo-3D HOI dataset to train and evaluate the pipeline against baselines. Quantitative and qualitative results show improved image fidelity, pose plausibility, and interaction realism, with demonstrated applicability to diverse object inputs and artistic styles. Limitations include texture inconsistencies and single-object focus, pointing to future work on multi-person scenes, text-controllable prompts, and video-style HOI synthesis.
Abstract
Recent generative models can synthesize high-quality images, but they often fail to generate humans interacting with objects using their hands. This arises mostly from the model's misunderstanding of such interactions and the hardships of synthesizing intricate regions of the body. In this paper, we propose \textbf{GraspDiffusion}, a novel generative method that creates realistic scenes of human-object interaction. Given a 3D object, GraspDiffusion constructs whole-body poses with control over the object's location relative to the human body, which is achieved by separately leveraging the generative priors for body and hand poses, optimizing them into a joint grasping pose. This pose guides the image synthesis to correctly reflect the intended interaction, creating realistic and diverse human-object interaction scenes. We demonstrate that GraspDiffusion can successfully tackle the relatively uninvestigated problem of generating full-bodied human-object interactions while outperforming previous methods. Our project page is available at https://yj7082126.github.io/graspdiffusion/
