InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians
Kefan Chen, Sergiu Oprea, Justin Theiss, Sreyas Mohan, Srinath Sridhar, Aayush Prakash
TL;DR
InteractAvatar tackles the challenge of rendering photorealistic hand-face interactions in digital avatars. It introduces a hybrid mesh-Gaussian avatar that couples a Dynamic Gaussian Hand with a learnable Hand-Face Interaction module, anchored to FLAME and MANO meshes to capture pose-dependent geometry and shading $\left\{\mu_i,\Sigma_i,c_i,o_i\right\}$ changes via per-Gaussian MLPs. A DECAF-inspired collision refinement and an adaptive sampling strategy enable precise hand-face contact modeling and rendering of complex shadows and wrinkles. Evaluations on the multi-view DECAF dataset show improved $PSNR$, $LPIPS$, and perceptual quality for novel views, self-enactment, and cross-identity reenactment, demonstrating strong generalization to unseen poses and identities with implications for AR/VR and telepresence. Limitations include dataset coverage and wild generalization, suggesting avenues for broader hand-face configuration modeling in future work.
Abstract
With the rising interest from the community in digital avatars coupled with the importance of expressions and gestures in communication, modeling natural avatar behavior remains an important challenge across many industries such as teleconferencing, gaming, and AR/VR. Human hands are the primary tool for interacting with the environment and essential for realistic human behavior modeling, yet existing 3D hand and head avatar models often overlook the crucial aspect of hand-body interactions, such as between hand and face. We present InteracttAvatar, the first model to faithfully capture the photorealistic appearance of dynamic hand and non-rigid hand-face interactions. Our novel Dynamic Gaussian Hand model, combining template model and 3D Gaussian Splatting as well as a dynamic refinement module, captures pose-dependent change, e.g. the fine wrinkles and complex shadows that occur during articulation. Importantly, our hand-face interaction module models the subtle geometry and appearance dynamics that underlie common gestures. Through experiments of novel view synthesis, self reenactment and cross-identity reenactment, we demonstrate that InteracttAvatar can reconstruct hand and hand-face interactions from monocular or multiview videos with high-fidelity details and be animated with novel poses.
