Authentic Hand Avatar from a Phone Scan via Universal Hand Model
Gyeongsik Moon, Weipeng Xu, Rohan Joshi, Chenglei Wu, Takaaki Shiratori
TL;DR
This work introduces the Universal Hand Model (UHM), a high-fidelity 3D hand model capable of representing arbitrary IDs and adapting to individuals from short phone scans to deliver authentic hand avatars. Unlike prior pipelines that separate tracking and modeling, UHM performs simultaneous tracking and modeling using a learned ID and pose decomposition with corrective terms, coupled with a novel image matching loss to prevent skin sliding. An end-to-end adaptation pipeline leverages priors from UHM, ShadowNet-based shadow removal, UV texture unwrapping, and perceptual texture optimization to produce highly authentic personalized hand avatars, demonstrated across studio and phone-scan datasets and outperforming existing universal and personalized hand models. The approach enables realistic, pose-controllable hand avatars suitable for AR/VR and interactive applications, with practical speed relative to studio-based methods and robust generalization to unseen identities and poses.
Abstract
The authentic 3D hand avatar with every identifiable information, such as hand shapes and textures, is necessary for immersive experiences in AR/VR. In this paper, we present a universal hand model (UHM), which 1) can universally represent high-fidelity 3D hand meshes of arbitrary identities (IDs) and 2) can be adapted to each person with a short phone scan for the authentic hand avatar. For effective universal hand modeling, we perform tracking and modeling at the same time, while previous 3D hand models perform them separately. The conventional separate pipeline suffers from the accumulated errors from the tracking stage, which cannot be recovered in the modeling stage. On the other hand, ours does not suffer from the accumulated errors while having a much more concise overall pipeline. We additionally introduce a novel image matching loss function to address a skin sliding during the tracking and modeling, while existing works have not focused on it much. Finally, using learned priors from our UHM, we effectively adapt our UHM to each person's short phone scan for the authentic hand avatar.
