Table of Contents
Fetching ...

Interaction-Aware 4D Gaussian Splatting for Dynamic Hand-Object Interaction Reconstruction

Hao Tian, Chenyangguang Zhang, Rui Liu, Wen Shen, Xiaolin Qin

TL;DR

This work tackles dynamic hand-object interaction reconstruction without object priors by introducing interaction-aware hand-object Gaussians with learnable weights $w$ and radii $o$, coupled with hand-informed object deformation across three implicit fields (hand, object, background). A progressive optimization framework and explicit 3D regularizations guide physically plausible, edge-accurate reconstructions under heavy occlusion, outperforming state-of-the-art 4D Gaussian Splatting methods on HOI4D and HO3D datasets. The approach achieves superior quantitative metrics (e.g., PSNR/SSIM/LPIPS) and qualitative renderings, demonstrating robust handling of occlusion, contact, and dynamic hand-object configurations without relying on object priors. This has practical impact for VR/robotics scenarios where object priors are unavailable or impractical to acquire, enabling faithful and fast HOI scene reconstruction from RGB inputs.

Abstract

This paper focuses on a challenging setting of simultaneously modeling geometry and appearance of hand-object interaction scenes without any object priors. We follow the trend of dynamic 3D Gaussian Splatting based methods, and address several significant challenges. To model complex hand-object interaction with mutual occlusion and edge blur, we present interaction-aware hand-object Gaussians with newly introduced optimizable parameters aiming to adopt piecewise linear hypothesis for clearer structural representation. Moreover, considering the complementarity and tightness of hand shape and object shape during interaction dynamics, we incorporate hand information into object deformation field, constructing interaction-aware dynamic fields to model flexible motions. To further address difficulties in the optimization process, we propose a progressive strategy that handles dynamic regions and static background step by step. Correspondingly, explicit regularizations are designed to stabilize the hand-object representations for smooth motion transition, physical interaction reality, and coherent lighting. Experiments show that our approach surpasses existing dynamic 3D-GS-based methods and achieves state-of-the-art performance in reconstructing dynamic hand-object interaction.

Interaction-Aware 4D Gaussian Splatting for Dynamic Hand-Object Interaction Reconstruction

TL;DR

This work tackles dynamic hand-object interaction reconstruction without object priors by introducing interaction-aware hand-object Gaussians with learnable weights and radii , coupled with hand-informed object deformation across three implicit fields (hand, object, background). A progressive optimization framework and explicit 3D regularizations guide physically plausible, edge-accurate reconstructions under heavy occlusion, outperforming state-of-the-art 4D Gaussian Splatting methods on HOI4D and HO3D datasets. The approach achieves superior quantitative metrics (e.g., PSNR/SSIM/LPIPS) and qualitative renderings, demonstrating robust handling of occlusion, contact, and dynamic hand-object configurations without relying on object priors. This has practical impact for VR/robotics scenarios where object priors are unavailable or impractical to acquire, enabling faithful and fast HOI scene reconstruction from RGB inputs.

Abstract

This paper focuses on a challenging setting of simultaneously modeling geometry and appearance of hand-object interaction scenes without any object priors. We follow the trend of dynamic 3D Gaussian Splatting based methods, and address several significant challenges. To model complex hand-object interaction with mutual occlusion and edge blur, we present interaction-aware hand-object Gaussians with newly introduced optimizable parameters aiming to adopt piecewise linear hypothesis for clearer structural representation. Moreover, considering the complementarity and tightness of hand shape and object shape during interaction dynamics, we incorporate hand information into object deformation field, constructing interaction-aware dynamic fields to model flexible motions. To further address difficulties in the optimization process, we propose a progressive strategy that handles dynamic regions and static background step by step. Correspondingly, explicit regularizations are designed to stabilize the hand-object representations for smooth motion transition, physical interaction reality, and coherent lighting. Experiments show that our approach surpasses existing dynamic 3D-GS-based methods and achieves state-of-the-art performance in reconstructing dynamic hand-object interaction.

Paper Structure

This paper contains 12 sections, 13 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Differences between traditional 3D Gaussian-based hand-object reconstruction and our interaction-aware modeling. Conventional 3D Gaussian approaches model the entire HOI scene with a single, unified implicit field and rely primarily on 2D supervision. This design often leads to geometric ambiguities during close interactions—such as collapsed clearances, blurred contact boundaries, and non-physical merging of hand and object surfaces (top). In contrast, our method explicitly decouples hand and object representations into separate fields, introduces interaction-aware parameters ($\boldsymbol{w}, \boldsymbol{o}$) to modulate occlusion and edge sharpness, and leverages interaction-aware losses to preserve fine-grained spatial relationships, enabling accurate and disentangled dynamic reconstruction (bottom).
  • Figure 2: Overview of interaction-aware hand-object Gaussians. We propose a novel framework for reconstructing dynamic HOI scenes from RGB videos without object shape priors. The framework consists of three components: (1) Specialized Implicit Fields: separate hand, object, and background fields disentangle dynamic interactions, with hand/object fields capturing high-frequency deformations and occlusions (leveraging hand information for object's interaction-aware deformation) while the background field maintains low-frequency stability; (2) Interaction-aware Gaussian: enhances representation with adaptive weights $w$ and radius $o$ to address contour ambiguity and occlusions; (3) Progressive Optimization: combines explicit supervision with physical interaction constraints for efficient convergence.
  • Figure 3: Qualitative comparison of our approach and the baseline methods. We present reconstructions from our model and SOTA baselines (4DGS 4dgs, Deform3DGS deformable3dgs, SC-GS scgs) on HOI4D and HO3D datasets.
  • Figure 4: Novel view synthesis of our approach and SC-GS scgs. Our method shows cleaner renderings from novel viewpoints (within the egocentric viewing cone), whereas SC-GS suffers from noticeable artifacts.
  • Figure 5: Our method maintains consistently high rendering quality across different noise levels, showing strong robustness to initialization errors.
  • ...and 1 more figures