Table of Contents
Fetching ...

URHand: Universal Relightable Hands

Zhaoxi Chen, Gyeongsik Moon, Kaiwen Guo, Chen Cao, Stanislav Pidhorskyi, Tomas Simon, Rohan Joshi, Yuan Dong, Yichen Xu, Bernardo Pires, He Wen, Lucas Evans, Bo Peng, Julia Buffalini, Autumn Trimble, Kevyn McPhail, Melissa Schoeller, Shoou-I Yu, Javier Romero, Michael Zollhöfer, Yaser Sheikh, Ziwei Liu, Shunsuke Saito

TL;DR

URHand introduces a universal relightable hand model that generalizes across viewpoints, poses, illuminations, and identities. It combines a physically based geometry/refinement path with a neural relighting path through a spatially varying linear lighting model that preserves light transport linearity, enabling real-time relighting under continuous illuminations. A hybrid neural-physical framework, reinforced by a lighting-aware adversarial loss and L1 feature regularization, achieves high fidelity and robust generalization, including quick personalization from a casual phone scan. The approach is validated via extensive ablations and cross-identity comparisons, demonstrating superior performance over prior methods and offering a practical path to photorealistic, relightable hands in interactive applications.

Abstract

Existing photorealistic relightable hand models require extensive identity-specific observations in different views, poses, and illuminations, and face challenges in generalizing to natural illuminations and novel identities. To bridge this gap, we present URHand, the first universal relightable hand model that generalizes across viewpoints, poses, illuminations, and identities. Our model allows few-shot personalization using images captured with a mobile phone, and is ready to be photorealistically rendered under novel illuminations. To simplify the personalization process while retaining photorealism, we build a powerful universal relightable prior based on neural relighting from multi-view images of hands captured in a light stage with hundreds of identities. The key challenge is scaling the cross-identity training while maintaining personalized fidelity and sharp details without compromising generalization under natural illuminations. To this end, we propose a spatially varying linear lighting model as the neural renderer that takes physics-inspired shading as input feature. By removing non-linear activations and bias, our specifically designed lighting model explicitly keeps the linearity of light transport. This enables single-stage training from light-stage data while generalizing to real-time rendering under arbitrary continuous illuminations across diverse identities. In addition, we introduce the joint learning of a physically based model and our neural relighting model, which further improves fidelity and generalization. Extensive experiments show that our approach achieves superior performance over existing methods in terms of both quality and generalizability. We also demonstrate quick personalization of URHand from a short phone scan of an unseen identity.

URHand: Universal Relightable Hands

TL;DR

URHand introduces a universal relightable hand model that generalizes across viewpoints, poses, illuminations, and identities. It combines a physically based geometry/refinement path with a neural relighting path through a spatially varying linear lighting model that preserves light transport linearity, enabling real-time relighting under continuous illuminations. A hybrid neural-physical framework, reinforced by a lighting-aware adversarial loss and L1 feature regularization, achieves high fidelity and robust generalization, including quick personalization from a casual phone scan. The approach is validated via extensive ablations and cross-identity comparisons, demonstrating superior performance over prior methods and offering a practical path to photorealistic, relightable hands in interactive applications.

Abstract

Existing photorealistic relightable hand models require extensive identity-specific observations in different views, poses, and illuminations, and face challenges in generalizing to natural illuminations and novel identities. To bridge this gap, we present URHand, the first universal relightable hand model that generalizes across viewpoints, poses, illuminations, and identities. Our model allows few-shot personalization using images captured with a mobile phone, and is ready to be photorealistically rendered under novel illuminations. To simplify the personalization process while retaining photorealism, we build a powerful universal relightable prior based on neural relighting from multi-view images of hands captured in a light stage with hundreds of identities. The key challenge is scaling the cross-identity training while maintaining personalized fidelity and sharp details without compromising generalization under natural illuminations. To this end, we propose a spatially varying linear lighting model as the neural renderer that takes physics-inspired shading as input feature. By removing non-linear activations and bias, our specifically designed lighting model explicitly keeps the linearity of light transport. This enables single-stage training from light-stage data while generalizing to real-time rendering under arbitrary continuous illuminations across diverse identities. In addition, we introduce the joint learning of a physically based model and our neural relighting model, which further improves fidelity and generalization. Extensive experiments show that our approach achieves superior performance over existing methods in terms of both quality and generalizability. We also demonstrate quick personalization of URHand from a short phone scan of an unseen identity.
Paper Structure (22 sections, 13 equations, 9 figures, 4 tables)

This paper contains 22 sections, 13 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: URHand (a.k.a. Your Hand). Our model is a high-fidelity Universal prior for Relightable Hands built upon light-stage data. It generalizes to novel viewpoints, poses, identities, and illuminations, which enables quick personalization from a phone scan.
  • Figure 2: Overview of URHand. Our model takes as input a mean texture ${\mathcal{T}}$, hand pose $\theta$, and a coarse mesh ${\mathcal{M}}$ for each identity. The physical branch (Sec. \ref{['sec:phys-refiner']}) focuses on geometry refinement and providing accurate shading features for the neural branch (Sec. \ref{['sec:linear-model']}). The core of the neural branch is the linear lighting model which takes as input the physics-inspired shading features from the physical branch. The neural branch learns to predict the gain and bias map over the mean texture. We leverage a differentiable rasterizer for rendering and minimize the loss of both branches against ground truth images (Sec. \ref{['sec:training']}). The $sg(\cdot)$ denotes the stop-gradient operation.
  • Figure 3: Qualitative comparisons on sequences with grouped lights. We evaluate our method for both per-subject optimization and novel identity generalization against comparison methods. a) All methods are evaluated on the training identity with unseen segments. b) Methods are evaluated on unseen identity during training.
  • Figure 4: Ablation study on the design of linear lighting model. Our spatially varying linear lighting model produces realistic renderings, while the baseline methods fail to correctly model shadows or tend to be over smooth.
  • Figure 5: Ablation studies on the impact of lighting features and geometry refinement. Notably, our full model can produce fine-grained geometry like wrinkles and nails as well as specular highlights (e.g. little finger).
  • ...and 4 more figures