Table of Contents
Fetching ...

InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians

Kefan Chen, Sergiu Oprea, Justin Theiss, Sreyas Mohan, Srinath Sridhar, Aayush Prakash

TL;DR

InteractAvatar tackles the challenge of rendering photorealistic hand-face interactions in digital avatars. It introduces a hybrid mesh-Gaussian avatar that couples a Dynamic Gaussian Hand with a learnable Hand-Face Interaction module, anchored to FLAME and MANO meshes to capture pose-dependent geometry and shading $\left\{\mu_i,\Sigma_i,c_i,o_i\right\}$ changes via per-Gaussian MLPs. A DECAF-inspired collision refinement and an adaptive sampling strategy enable precise hand-face contact modeling and rendering of complex shadows and wrinkles. Evaluations on the multi-view DECAF dataset show improved $PSNR$, $LPIPS$, and perceptual quality for novel views, self-enactment, and cross-identity reenactment, demonstrating strong generalization to unseen poses and identities with implications for AR/VR and telepresence. Limitations include dataset coverage and wild generalization, suggesting avenues for broader hand-face configuration modeling in future work.

Abstract

With the rising interest from the community in digital avatars coupled with the importance of expressions and gestures in communication, modeling natural avatar behavior remains an important challenge across many industries such as teleconferencing, gaming, and AR/VR. Human hands are the primary tool for interacting with the environment and essential for realistic human behavior modeling, yet existing 3D hand and head avatar models often overlook the crucial aspect of hand-body interactions, such as between hand and face. We present InteracttAvatar, the first model to faithfully capture the photorealistic appearance of dynamic hand and non-rigid hand-face interactions. Our novel Dynamic Gaussian Hand model, combining template model and 3D Gaussian Splatting as well as a dynamic refinement module, captures pose-dependent change, e.g. the fine wrinkles and complex shadows that occur during articulation. Importantly, our hand-face interaction module models the subtle geometry and appearance dynamics that underlie common gestures. Through experiments of novel view synthesis, self reenactment and cross-identity reenactment, we demonstrate that InteracttAvatar can reconstruct hand and hand-face interactions from monocular or multiview videos with high-fidelity details and be animated with novel poses.

InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians

TL;DR

InteractAvatar tackles the challenge of rendering photorealistic hand-face interactions in digital avatars. It introduces a hybrid mesh-Gaussian avatar that couples a Dynamic Gaussian Hand with a learnable Hand-Face Interaction module, anchored to FLAME and MANO meshes to capture pose-dependent geometry and shading changes via per-Gaussian MLPs. A DECAF-inspired collision refinement and an adaptive sampling strategy enable precise hand-face contact modeling and rendering of complex shadows and wrinkles. Evaluations on the multi-view DECAF dataset show improved , , and perceptual quality for novel views, self-enactment, and cross-identity reenactment, demonstrating strong generalization to unseen poses and identities with implications for AR/VR and telepresence. Limitations include dataset coverage and wild generalization, suggesting avenues for broader hand-face configuration modeling in future work.

Abstract

With the rising interest from the community in digital avatars coupled with the importance of expressions and gestures in communication, modeling natural avatar behavior remains an important challenge across many industries such as teleconferencing, gaming, and AR/VR. Human hands are the primary tool for interacting with the environment and essential for realistic human behavior modeling, yet existing 3D hand and head avatar models often overlook the crucial aspect of hand-body interactions, such as between hand and face. We present InteracttAvatar, the first model to faithfully capture the photorealistic appearance of dynamic hand and non-rigid hand-face interactions. Our novel Dynamic Gaussian Hand model, combining template model and 3D Gaussian Splatting as well as a dynamic refinement module, captures pose-dependent change, e.g. the fine wrinkles and complex shadows that occur during articulation. Importantly, our hand-face interaction module models the subtle geometry and appearance dynamics that underlie common gestures. Through experiments of novel view synthesis, self reenactment and cross-identity reenactment, we demonstrate that InteracttAvatar can reconstruct hand and hand-face interactions from monocular or multiview videos with high-fidelity details and be animated with novel poses.

Paper Structure

This paper contains 19 sections, 19 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: We propose InteractAvatar which enables (a) Dynamic Gaussian Hand. Our novel representation anchors 3D Gaussian kernels to a hand template mesh and a learnable neural network allowing for pose-dependent articulation, self-cast shadows, and high-fidelity appearance modeling. (b) Non-Rigid Hand-Face Interaction. We introduce a learnable interaction module that refines hand-induced deformations and shading effects on the face, ensuring realistic skin contact dynamics. (c) Cross-Actor Enactment. We can transfer hand and face motions across different subjects, demonstrating its generalization capability to unseen identities and gestures.
  • Figure 2: Overview of InteractAvatar. Our method combines mesh-based geometry (FLAME, MANO) with 3D Gaussian Splatting for realistic hand-face interactions. The dynamic hand appearance module refines pose-dependent deformations, wrinkles, and shadows, while the Hand-Face Interaction module enhances contact-aware geometry and shading adjustments. This enables high-fidelity animation with lifelike interactions and appearance changes.
  • Figure 3: Dynamic Gaussian Hand adapts to pose, capturing self-cast shadows, wrinkles, and shading variations. The baseline methods struggle with static hand modeling, whereas our approach preserves fine-grained details across diverse hand poses.
  • Figure 4: Qualitative Comparison of Hand-Face Interactions from Novel Views. Our method produces sharp, high-fidelity details on non-rigid facial deformations and dynamic hand appearances, outperforming baseline models like GaussianAvatar qian2024gaussianavatars and SplattingAvatar shao2024splattingavatar Features like shadowing, wrinkles, and natural hand-face deformations are accurately reconstructed.
  • Figure 5: Self-Enactment with InteractAvatar. Our method accurately reconstructs natural hand-face interactions, preserving fine geometric details and appearance consistency in self-enactment tasks. Compared to baselines, InteractAvatar effectively models dynamic wrinkles, shadows, and subtle hand-induced deformations.
  • ...and 2 more figures