Interaction-Aware 4D Gaussian Splatting for Dynamic Hand-Object Interaction Reconstruction
Hao Tian, Chenyangguang Zhang, Rui Liu, Wen Shen, Xiaolin Qin
TL;DR
This work tackles dynamic hand-object interaction reconstruction without object priors by introducing interaction-aware hand-object Gaussians with learnable weights $w$ and radii $o$, coupled with hand-informed object deformation across three implicit fields (hand, object, background). A progressive optimization framework and explicit 3D regularizations guide physically plausible, edge-accurate reconstructions under heavy occlusion, outperforming state-of-the-art 4D Gaussian Splatting methods on HOI4D and HO3D datasets. The approach achieves superior quantitative metrics (e.g., PSNR/SSIM/LPIPS) and qualitative renderings, demonstrating robust handling of occlusion, contact, and dynamic hand-object configurations without relying on object priors. This has practical impact for VR/robotics scenarios where object priors are unavailable or impractical to acquire, enabling faithful and fast HOI scene reconstruction from RGB inputs.
Abstract
This paper focuses on a challenging setting of simultaneously modeling geometry and appearance of hand-object interaction scenes without any object priors. We follow the trend of dynamic 3D Gaussian Splatting based methods, and address several significant challenges. To model complex hand-object interaction with mutual occlusion and edge blur, we present interaction-aware hand-object Gaussians with newly introduced optimizable parameters aiming to adopt piecewise linear hypothesis for clearer structural representation. Moreover, considering the complementarity and tightness of hand shape and object shape during interaction dynamics, we incorporate hand information into object deformation field, constructing interaction-aware dynamic fields to model flexible motions. To further address difficulties in the optimization process, we propose a progressive strategy that handles dynamic regions and static background step by step. Correspondingly, explicit regularizations are designed to stabilize the hand-object representations for smooth motion transition, physical interaction reality, and coherent lighting. Experiments show that our approach surpasses existing dynamic 3D-GS-based methods and achieves state-of-the-art performance in reconstructing dynamic hand-object interaction.
