XHand: Real-time Expressive Hand Avatar
Qijun Gan, Zijie Zhou, Jianke Zhu
TL;DR
XHand tackles the problem of real-time rendering of expressive, high-fidelity hand avatars. It introduces three feature embedding modules to predict vertex displacements, albedo, and LBS weights on a subdivided MANO template, coupled with a mesh-based neural renderer and a part-aware Laplace smoothing regularizer. The approach achieves state-of-the-art rendering quality and geometry fidelity on InterHand2.6M and DeepHandMesh, running at real-time speeds and outperforming prior volumetric and mesh-based methods. This work advances immersive XR and gaming capabilities by delivering detailed, pose-consistent hand avatars with efficient, end-to-end training and rendering pipelines.
Abstract
Hand avatars play a pivotal role in a wide array of digital interfaces, enhancing user immersion and facilitating natural interaction within virtual environments. While previous studies have focused on photo-realistic hand rendering, little attention has been paid to reconstruct the hand geometry with fine details, which is essential to rendering quality. In the realms of extended reality and gaming, on-the-fly rendering becomes imperative. To this end, we introduce an expressive hand avatar, named XHand, that is designed to comprehensively generate hand shape, appearance, and deformations in real-time. To obtain fine-grained hand meshes, we make use of three feature embedding modules to predict hand deformation displacements, albedo, and linear blending skinning weights, respectively. To achieve photo-realistic hand rendering on fine-grained meshes, our method employs a mesh-based neural renderer by leveraging mesh topological consistency and latent codes from embedding modules. During training, a part-aware Laplace smoothing strategy is proposed by incorporating the distinct levels of regularization to effectively maintain the necessary details and eliminate the undesired artifacts. The experimental evaluations on InterHand2.6M and DeepHandMesh datasets demonstrate the efficacy of XHand, which is able to recover high-fidelity geometry and texture for hand animations across diverse poses in real-time. To reproduce our results, we will make the full implementation publicly available at https://github.com/agnJason/XHand.
