Table of Contents
Fetching ...

StrandHead: Text to Hair-Disentangled 3D Head Avatars Using Human-Centric Priors

Xiaokun Sun, Zeyu Cai, Ying Tai, Jian Yang, Zhenyu Zhang

TL;DR

This work proposes StrandHead, a novel text-driven method capable of generating 3D hair strands and disentangled head avatars with strand-level attributes, and proposes a meshing approach guided by strand geometry to guarantee the gradient flow from the distillation objective to the neural strand representation.

Abstract

While haircut indicates distinct personality, existing avatar generation methods fail to model practical hair due to the data limitation or entangled representation. We propose StrandHead, a novel text-driven method capable of generating 3D hair strands and disentangled head avatars with strand-level attributes. Instead of using large-scale hair-text paired data for supervision, we demonstrate that realistic hair strands can be generated from prompts by distilling 2D generative models pre-trained on human mesh data. To this end, we propose a meshing approach guided by strand geometry to guarantee the gradient flow from the distillation objective to the neural strand representation. The optimization is then regularized by statistically significant haircut features, leading to stable updating of strands against unreasonable drifting. These employed 2D/3D human-centric priors contribute to text-aligned and realistic 3D strand generation. Extensive experiments show that StrandHead achieves the state-of-the-art performance on text to strand generation and disentangled 3D head avatar modeling. The generated 3D hair can be applied on avatars for strand-level editing, as well as implemented in the graphics engine for physical simulation or other applications. Project page: https://xiaokunsun.github.io/StrandHead.github.io/.

StrandHead: Text to Hair-Disentangled 3D Head Avatars Using Human-Centric Priors

TL;DR

This work proposes StrandHead, a novel text-driven method capable of generating 3D hair strands and disentangled head avatars with strand-level attributes, and proposes a meshing approach guided by strand geometry to guarantee the gradient flow from the distillation objective to the neural strand representation.

Abstract

While haircut indicates distinct personality, existing avatar generation methods fail to model practical hair due to the data limitation or entangled representation. We propose StrandHead, a novel text-driven method capable of generating 3D hair strands and disentangled head avatars with strand-level attributes. Instead of using large-scale hair-text paired data for supervision, we demonstrate that realistic hair strands can be generated from prompts by distilling 2D generative models pre-trained on human mesh data. To this end, we propose a meshing approach guided by strand geometry to guarantee the gradient flow from the distillation objective to the neural strand representation. The optimization is then regularized by statistically significant haircut features, leading to stable updating of strands against unreasonable drifting. These employed 2D/3D human-centric priors contribute to text-aligned and realistic 3D strand generation. Extensive experiments show that StrandHead achieves the state-of-the-art performance on text to strand generation and disentangled 3D head avatar modeling. The generated 3D hair can be applied on avatars for strand-level editing, as well as implemented in the graphics engine for physical simulation or other applications. Project page: https://xiaokunsun.github.io/StrandHead.github.io/.

Paper Structure

This paper contains 22 sections, 17 equations, 24 figures, 3 tables.

Figures (24)

  • Figure 1: StrandHead includes three stages: (a) We first create a FLAME-aligned 3D bald head using the improved HumanNorm humannorm. (b) Next, we introduce a differentiable prismatization algorithm to enable human-specific geometry-aware 2D diffusion models to supervise hair shape modeling. Additionally, two losses inspired by 3D hair geometric priors are applied to further regularize the hair geometry. (c) Finally, we use a human-specific normal-conditioned 2D diffusion model to generate lifelike hair textures.
  • Figure 2: The differentiable prismatization algorithm's (a) gradient backpropagation process, (b) a strand-to-mesh conversion example and (c) advantages over NeuralHaircut neural_haircut. Non-watertight quad meshes can easily produce ambiguous normal maps, which significantly reduce the stability of hair shape modeling (see the drifting hair highlighted by the oval dotted box in (c)).
  • Figure 3: Observation of hair geometric features: (1) neighboring strand orientations are highly consistent.(2) strand curvature is strongly and positively related to the haircut curliness.
  • Figure 4: Examples of high-fidelity and diverse 3D heads and strand-accurate haircuts generated by our method. The upper visualization includes rendered color and normal maps of the head and hair prismatic meshes. The lower visualization shows the physics-based hair strand rendering result using Blender blender. For better strand-based visualization, we interpolate generated hair to approximately 10,000 strands and apply a consistent appearance. Please zoom in for detailed views, and refer to the Supp. Mat. for video demonstrations.
  • Figure 5: Qualitative comparisons with the SOTA methods. Since TECA teca uses the vanilla NeRF to represent hair, rendering normals is not supported. HAAR haar generates only the geometry of hair strands, so we first convert the strands into prismatic meshes using differentiable prismatization and then utilize TEXTure texture to generate texture for visualization and comparison.
  • ...and 19 more figures