Table of Contents
Fetching ...

PALM: A Dataset and Baseline for Learning Multi-subject Hand Prior

Zicong Fan, Edoardo Remelli, David Dimond, Fadime Sener, Liuhao Ge, Bugra Tekin, Cem Keskin, Shreyas Hampali

TL;DR

PALM tackles the challenge of learning generalizable hand priors by introducing a large, diverse dataset of 13k high-quality 3D hand scans and 90k calibrated multi-view RGB images from 263 subjects, with corresponding MANO registrations. It couples this dataset with PALM-Net, a multi-subject implicit hand prior learned through physically based inverse rendering to produce relightable, personalized hand avatars from a single image. The approach disentangles geometry, appearance, and lighting using subject-specific shape and appearance codes within a canonical MANO-based space, enabling robust personalization under unknown illumination. Experimental results on synthetic and real data show PALM-Net consistently outperforms prior methods in hand avatar personalization and relighting, highlighting PALM's potential as a foundational resource for realistic hand modeling and related applications.

Abstract

The ability to grasp objects, signal with gestures, and share emotion through touch all stem from the unique capabilities of human hands. Yet creating high-quality personalized hand avatars from images remains challenging due to complex geometry, appearance, and articulation, particularly under unconstrained lighting and limited views. Progress has also been limited by the lack of datasets that jointly provide accurate 3D geometry, high-resolution multiview imagery, and a diverse population of subjects. To address this, we present PALM, a large-scale dataset comprising 13k high-quality hand scans from 263 subjects and 90k multi-view images, capturing rich variation in skin tone, age, and geometry. To show its utility, we present a baseline PALM-Net, a multi-subject prior over hand geometry and material properties learned via physically based inverse rendering, enabling realistic, relightable single-image hand avatar personalization. PALM's scale and diversity make it a valuable real-world resource for hand modeling and related research.

PALM: A Dataset and Baseline for Learning Multi-subject Hand Prior

TL;DR

PALM tackles the challenge of learning generalizable hand priors by introducing a large, diverse dataset of 13k high-quality 3D hand scans and 90k calibrated multi-view RGB images from 263 subjects, with corresponding MANO registrations. It couples this dataset with PALM-Net, a multi-subject implicit hand prior learned through physically based inverse rendering to produce relightable, personalized hand avatars from a single image. The approach disentangles geometry, appearance, and lighting using subject-specific shape and appearance codes within a canonical MANO-based space, enabling robust personalization under unknown illumination. Experimental results on synthetic and real data show PALM-Net consistently outperforms prior methods in hand avatar personalization and relighting, highlighting PALM's potential as a foundational resource for realistic hand modeling and related applications.

Abstract

The ability to grasp objects, signal with gestures, and share emotion through touch all stem from the unique capabilities of human hands. Yet creating high-quality personalized hand avatars from images remains challenging due to complex geometry, appearance, and articulation, particularly under unconstrained lighting and limited views. Progress has also been limited by the lack of datasets that jointly provide accurate 3D geometry, high-resolution multiview imagery, and a diverse population of subjects. To address this, we present PALM, a large-scale dataset comprising 13k high-quality hand scans from 263 subjects and 90k multi-view images, capturing rich variation in skin tone, age, and geometry. To show its utility, we present a baseline PALM-Net, a multi-subject prior over hand geometry and material properties learned via physically based inverse rendering, enabling realistic, relightable single-image hand avatar personalization. PALM's scale and diversity make it a valuable real-world resource for hand modeling and related research.

Paper Structure

This paper contains 13 sections, 14 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Dataset overview:PALM is a large-scale dataset comprising calibrated multi-view high-resolution RGB images and 3dMD hand scans (a). It features $263$ subjects spanning a wide range of skin tones and hand sizes, $90k$ RGB images, and $13k$ high-quality hand scans with corresponding MANO registrations (b). This diversity and precision provide a foundation for learning a universal prior over human hand shape and appearance.
  • Figure 2: Capture setup. Our 3dMD setup with $7$ RGB cameras.
  • Figure 3: PALM demographics. (a) Age; (b) Height; (c) Skin tone distributions. Our dataset provides a wide distribution of skin tones and age groups representing a large variety of hand textures.
  • Figure 4: PALM-Net overview. Given (a) PALM, our multi-subject RGB dataset with $263$ subjects, PALM-Net explains each subject by optimizing subject-specific shape and appearance codes (b). (c) PALM-Net is an implicit physically-based network that is conditioned on the subject codes and renders to radiance, normal, and physically-based RGB images.
  • Figure 5: In-the-wild image personalization. (a) The first column shows the images used for personalization, followed by the renderings of the geometry and materials of the hand avatar obtained using our prior model. The PBR rendering refers to the physically-based rendering with estimated environment map. The last column shows the relighting results of personalized hand avatar in a novel pose. Our method retrieves realistic hand avatars even when the input personalization image has complex lighting effects. (b) Additional relighting results with in-the-wild images.
  • ...and 3 more figures