Table of Contents
Fetching ...

Authentic Hand Avatar from a Phone Scan via Universal Hand Model

Gyeongsik Moon, Weipeng Xu, Rohan Joshi, Chenglei Wu, Takaaki Shiratori

TL;DR

This work introduces the Universal Hand Model (UHM), a high-fidelity 3D hand model capable of representing arbitrary IDs and adapting to individuals from short phone scans to deliver authentic hand avatars. Unlike prior pipelines that separate tracking and modeling, UHM performs simultaneous tracking and modeling using a learned ID and pose decomposition with corrective terms, coupled with a novel image matching loss to prevent skin sliding. An end-to-end adaptation pipeline leverages priors from UHM, ShadowNet-based shadow removal, UV texture unwrapping, and perceptual texture optimization to produce highly authentic personalized hand avatars, demonstrated across studio and phone-scan datasets and outperforming existing universal and personalized hand models. The approach enables realistic, pose-controllable hand avatars suitable for AR/VR and interactive applications, with practical speed relative to studio-based methods and robust generalization to unseen identities and poses.

Abstract

The authentic 3D hand avatar with every identifiable information, such as hand shapes and textures, is necessary for immersive experiences in AR/VR. In this paper, we present a universal hand model (UHM), which 1) can universally represent high-fidelity 3D hand meshes of arbitrary identities (IDs) and 2) can be adapted to each person with a short phone scan for the authentic hand avatar. For effective universal hand modeling, we perform tracking and modeling at the same time, while previous 3D hand models perform them separately. The conventional separate pipeline suffers from the accumulated errors from the tracking stage, which cannot be recovered in the modeling stage. On the other hand, ours does not suffer from the accumulated errors while having a much more concise overall pipeline. We additionally introduce a novel image matching loss function to address a skin sliding during the tracking and modeling, while existing works have not focused on it much. Finally, using learned priors from our UHM, we effectively adapt our UHM to each person's short phone scan for the authentic hand avatar.

Authentic Hand Avatar from a Phone Scan via Universal Hand Model

TL;DR

This work introduces the Universal Hand Model (UHM), a high-fidelity 3D hand model capable of representing arbitrary IDs and adapting to individuals from short phone scans to deliver authentic hand avatars. Unlike prior pipelines that separate tracking and modeling, UHM performs simultaneous tracking and modeling using a learned ID and pose decomposition with corrective terms, coupled with a novel image matching loss to prevent skin sliding. An end-to-end adaptation pipeline leverages priors from UHM, ShadowNet-based shadow removal, UV texture unwrapping, and perceptual texture optimization to produce highly authentic personalized hand avatars, demonstrated across studio and phone-scan datasets and outperforming existing universal and personalized hand models. The approach enables realistic, pose-controllable hand avatars suitable for AR/VR and interactive applications, with practical speed relative to studio-based methods and robust generalization to unseen identities and poses.

Abstract

The authentic 3D hand avatar with every identifiable information, such as hand shapes and textures, is necessary for immersive experiences in AR/VR. In this paper, we present a universal hand model (UHM), which 1) can universally represent high-fidelity 3D hand meshes of arbitrary identities (IDs) and 2) can be adapted to each person with a short phone scan for the authentic hand avatar. For effective universal hand modeling, we perform tracking and modeling at the same time, while previous 3D hand models perform them separately. The conventional separate pipeline suffers from the accumulated errors from the tracking stage, which cannot be recovered in the modeling stage. On the other hand, ours does not suffer from the accumulated errors while having a much more concise overall pipeline. We additionally introduce a novel image matching loss function to address a skin sliding during the tracking and modeling, while existing works have not focused on it much. Finally, using learned priors from our UHM, we effectively adapt our UHM to each person's short phone scan for the authentic hand avatar.
Paper Structure (43 sections, 1 equation, 30 figures, 6 tables)

This paper contains 43 sections, 1 equation, 30 figures, 6 tables.

Figures (30)

  • Figure 1: We introduce (a) UHM, which can universally represent arbitrary IDs of hands at a high fidelity. Our adaptation pipeline fits pre-trained UHM to a phone scan, which produces (b) an animatable authentic 3D hand avatar. Images of (b) are rendered using our adapted hand avatar with the Phong reflection model and environment maps gardner2017learninghold2019deep.
  • Figure 2: The effectiveness of the correctives.
  • Figure 3: The overall pipeline of the proposed UHM. The estimated correctives (dotted green box at the bottom) are applied to a template mesh to refine it. Then, LBS is used to pose the template mesh.
  • Figure 4: (a) Reference texture. (b) Our image matching loss function encourages rasterized vertices (orange) to move to the target positions (green), where the target position is obtained by the optical flow (white arrow).
  • Figure 5: The overall pipeline to remove the shadow from the phone scan using our ShadowNet.
  • ...and 25 more figures