Table of Contents
Fetching ...

OHTA: One-shot Hand Avatar via Data-driven Implicit Priors

Xiaozheng Zheng, Chao Wen, Zhuo Su, Zeran Xu, Zhaohu Li, Yang Zhao, Zhou Xue

TL;DR

OHTA tackles one-shot hand avatar creation from a single RGB image by learning data-driven priors through a Hand Prior Network (HPNet) and applying a two-stage reconstruction pipeline. The hand is represented as a mesh-guided implicit model with a geometry occupancy field and a multi-resolution texture field, whose priors are decomposed into geometry, identity-specific albedo, and identity-shared shadow components. In the one-shot stage, HPNet priors guide texture inversion and texture fitting, aided by view-regularization and color calibration to achieve high fidelity across diverse poses and identities. The method demonstrates robust, animatable hand avatars on multiple datasets, supports text-to-avatar and editing tasks, and offers a continuous identity latent space for interpolation, all while requiring only one input image. This approach broadens accessibility for personalized digital hands and enables practical applications in content creation and augmented reality.

Abstract

In this paper, we delve into the creation of one-shot hand avatars, attaining high-fidelity and drivable hand representations swiftly from a single image. With the burgeoning domains of the digital human, the need for quick and personalized hand avatar creation has become increasingly critical. Existing techniques typically require extensive input data and may prove cumbersome or even impractical in certain scenarios. To enhance accessibility, we present a novel method OHTA (One-shot Hand avaTAr) that enables the creation of detailed hand avatars from merely one image. OHTA tackles the inherent difficulties of this data-limited problem by learning and utilizing data-driven hand priors. Specifically, we design a hand prior model initially employed for 1) learning various hand priors with available data and subsequently for 2) the inversion and fitting of the target identity with prior knowledge. OHTA demonstrates the capability to create high-fidelity hand avatars with consistent animatable quality, solely relying on a single image. Furthermore, we illustrate the versatility of OHTA through diverse applications, encompassing text-to-avatar conversion, hand editing, and identity latent space manipulation.

OHTA: One-shot Hand Avatar via Data-driven Implicit Priors

TL;DR

OHTA tackles one-shot hand avatar creation from a single RGB image by learning data-driven priors through a Hand Prior Network (HPNet) and applying a two-stage reconstruction pipeline. The hand is represented as a mesh-guided implicit model with a geometry occupancy field and a multi-resolution texture field, whose priors are decomposed into geometry, identity-specific albedo, and identity-shared shadow components. In the one-shot stage, HPNet priors guide texture inversion and texture fitting, aided by view-regularization and color calibration to achieve high fidelity across diverse poses and identities. The method demonstrates robust, animatable hand avatars on multiple datasets, supports text-to-avatar and editing tasks, and offers a continuous identity latent space for interpolation, all while requiring only one input image. This approach broadens accessibility for personalized digital hands and enables practical applications in content creation and augmented reality.

Abstract

In this paper, we delve into the creation of one-shot hand avatars, attaining high-fidelity and drivable hand representations swiftly from a single image. With the burgeoning domains of the digital human, the need for quick and personalized hand avatar creation has become increasingly critical. Existing techniques typically require extensive input data and may prove cumbersome or even impractical in certain scenarios. To enhance accessibility, we present a novel method OHTA (One-shot Hand avaTAr) that enables the creation of detailed hand avatars from merely one image. OHTA tackles the inherent difficulties of this data-limited problem by learning and utilizing data-driven hand priors. Specifically, we design a hand prior model initially employed for 1) learning various hand priors with available data and subsequently for 2) the inversion and fitting of the target identity with prior knowledge. OHTA demonstrates the capability to create high-fidelity hand avatars with consistent animatable quality, solely relying on a single image. Furthermore, we illustrate the versatility of OHTA through diverse applications, encompassing text-to-avatar conversion, hand editing, and identity latent space manipulation.
Paper Structure (33 sections, 5 equations, 29 figures, 5 tables)

This paper contains 33 sections, 5 equations, 29 figures, 5 tables.

Figures (29)

  • Figure 1: We introduce a novel approach capable of creating implicit animatable hand avatars using just a single image. Our framework facilitates 1) text-to-avatar conversion, 2) hand texture and geometry editing, and 3) interpolation and sampling within the latent space.
  • Figure 2: The two-stage framework of OHTA (above) and the Hand Prior Network (below). For stage-1, OHTA optimizes identity code and HPNet to capture various hand priors. For stage-2, OHTA first optimizes identity code for texture inversion, then optimizes HPNet for texture fitting to capture the details. HPNet consists of three fields (i.e. albedo, shadow, and occupancy) for capturing transferable hand prior knowledge. Combined with the albedo, shadow, and occupancy values, we use volume rendering to obtain final shaded color images.
  • Figure 3: Multi-resolution Field. For simplicity, we take two resolutions for example. $\oplus$ denotes feature concatenation.
  • Figure 4: Qualitative comparison with state-of-the-art methods on InterHand2.6M moon2020interhand2. The black box indicates the input image.
  • Figure 5: Comparison of novel view synthesis on InterHand2.6M moon2020interhand2. The green box indicates the input view.
  • ...and 24 more figures