OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
Xiaozheng Zheng, Chao Wen, Zhuo Su, Zeran Xu, Zhaohu Li, Yang Zhao, Zhou Xue
TL;DR
OHTA tackles one-shot hand avatar creation from a single RGB image by learning data-driven priors through a Hand Prior Network (HPNet) and applying a two-stage reconstruction pipeline. The hand is represented as a mesh-guided implicit model with a geometry occupancy field and a multi-resolution texture field, whose priors are decomposed into geometry, identity-specific albedo, and identity-shared shadow components. In the one-shot stage, HPNet priors guide texture inversion and texture fitting, aided by view-regularization and color calibration to achieve high fidelity across diverse poses and identities. The method demonstrates robust, animatable hand avatars on multiple datasets, supports text-to-avatar and editing tasks, and offers a continuous identity latent space for interpolation, all while requiring only one input image. This approach broadens accessibility for personalized digital hands and enables practical applications in content creation and augmented reality.
Abstract
In this paper, we delve into the creation of one-shot hand avatars, attaining high-fidelity and drivable hand representations swiftly from a single image. With the burgeoning domains of the digital human, the need for quick and personalized hand avatar creation has become increasingly critical. Existing techniques typically require extensive input data and may prove cumbersome or even impractical in certain scenarios. To enhance accessibility, we present a novel method OHTA (One-shot Hand avaTAr) that enables the creation of detailed hand avatars from merely one image. OHTA tackles the inherent difficulties of this data-limited problem by learning and utilizing data-driven hand priors. Specifically, we design a hand prior model initially employed for 1) learning various hand priors with available data and subsequently for 2) the inversion and fitting of the target identity with prior knowledge. OHTA demonstrates the capability to create high-fidelity hand avatars with consistent animatable quality, solely relying on a single image. Furthermore, we illustrate the versatility of OHTA through diverse applications, encompassing text-to-avatar conversion, hand editing, and identity latent space manipulation.
