Table of Contents
Fetching ...

BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands from a Single Image

Minje Kim, Tae-Kyun Kim

TL;DR

BiTT tackles the challenge of reconstructing realistic, relightable textures for two interacting hands from a single image. It introduces a coarse-to-fine pipeline that combines a texture parametric model (HTML) with a novel bi-directional texture reconstructor (BTR) to exploit left–right texture symmetry and recover occluded texture regions. The method jointly learns scene lighting, albedo, and full hand textures, guided by reconstruction, albedo-consistency, and symmetry losses. Experiments on InterHand2.6M and RGB2Hands show substantial improvements over state-of-the-art, enabling controllable, photorealistic two-hand avatars with single-image input and end-to-end training.

Abstract

Creating personalized hand avatars is important to offer a realistic experience to users on AR / VR platforms. While most prior studies focused on reconstructing 3D hand shapes, some recent work has tackled the reconstruction of hand textures on top of shapes. However, these methods are often limited to capturing pixels on the visible side of a hand, requiring diverse views of the hand in a video or multiple images as input. In this paper, we propose a novel method, BiTT(Bi-directional Texture reconstruction of Two hands), which is the first end-to-end trainable method for relightable, pose-free texture reconstruction of two interacting hands taking only a single RGB image, by three novel components: 1) bi-directional (left $\leftrightarrow$ right) texture reconstruction using the texture symmetry of left / right hands, 2) utilizing a texture parametric model for hand texture recovery, and 3) the overall coarse-to-fine stage pipeline for reconstructing personalized texture of two interacting hands. BiTT first estimates the scene light condition and albedo image from an input image, then reconstructs the texture of both hands through the texture parametric model and bi-directional texture reconstructor. In experiments using InterHand2.6M and RGB2Hands datasets, our method significantly outperforms state-of-the-art hand texture reconstruction methods quantitatively and qualitatively. The code is available at https://github.com/yunminjin2/BiTT

BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands from a Single Image

TL;DR

BiTT tackles the challenge of reconstructing realistic, relightable textures for two interacting hands from a single image. It introduces a coarse-to-fine pipeline that combines a texture parametric model (HTML) with a novel bi-directional texture reconstructor (BTR) to exploit left–right texture symmetry and recover occluded texture regions. The method jointly learns scene lighting, albedo, and full hand textures, guided by reconstruction, albedo-consistency, and symmetry losses. Experiments on InterHand2.6M and RGB2Hands show substantial improvements over state-of-the-art, enabling controllable, photorealistic two-hand avatars with single-image input and end-to-end training.

Abstract

Creating personalized hand avatars is important to offer a realistic experience to users on AR / VR platforms. While most prior studies focused on reconstructing 3D hand shapes, some recent work has tackled the reconstruction of hand textures on top of shapes. However, these methods are often limited to capturing pixels on the visible side of a hand, requiring diverse views of the hand in a video or multiple images as input. In this paper, we propose a novel method, BiTT(Bi-directional Texture reconstruction of Two hands), which is the first end-to-end trainable method for relightable, pose-free texture reconstruction of two interacting hands taking only a single RGB image, by three novel components: 1) bi-directional (left right) texture reconstruction using the texture symmetry of left / right hands, 2) utilizing a texture parametric model for hand texture recovery, and 3) the overall coarse-to-fine stage pipeline for reconstructing personalized texture of two interacting hands. BiTT first estimates the scene light condition and albedo image from an input image, then reconstructs the texture of both hands through the texture parametric model and bi-directional texture reconstructor. In experiments using InterHand2.6M and RGB2Hands datasets, our method significantly outperforms state-of-the-art hand texture reconstruction methods quantitatively and qualitatively. The code is available at https://github.com/yunminjin2/BiTT
Paper Structure (47 sections, 12 equations, 13 figures, 4 tables)

This paper contains 47 sections, 12 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Symmetrical hand textures of different identities are shown, taken from pairs of diametrical camera views of InterHand2.6M InterHand2.6M.
  • Figure 2: The architecture of BiTT. Our method consists of three steps: (1) scene estimation, (2) coarse stage, and (3) fine stage estimation. The scene estimation understands the scene by predicting the albedo image and lighting conditions with a given input image. Full detailed textures of both hands are reconstructed from the single image input. The hand texture parametric model is adopted in the coarse stage, then the bi-directional texture reconstruction refines the personalized hand textures by the texture symmetry of left-right-hands. Finally, we render both hands with Phong Illumination phong.
  • Figure 3: Detailed architecture of the decoding layer in the bi-directional texture reconstruction.
  • Figure 4: This figure shows the visible, invisible, usable symmetric texture mask on the UV texture map from an image.
  • Figure 5: Qualitative results of HTML HTML, S2Hand S2Hand, HARP HARP, and BiTT rendered on novel-pose and viewpoint. The last two rows pertain to the RGB2Hands RGB2Hands dataset, while the remaining rows are from the InterHand2.6M InterHand2.6M dataset.
  • ...and 8 more figures