Table of Contents
Fetching ...

S2TD-Face: Reconstruct a Detailed 3D Face with Controllable Texture from a Single Sketch

Zidu Wang, Xiangyu Zhu, Jiang Yu, Tianshuo Zhang, Zhen Lei

TL;DR

S2TD-Face tackles sketch-to-3D-face reconstruction by introducing a two-stage geometry pipeline that first predicts coarse 3DMM-based geometry and then refines details with a UV-space displacement map, guided by a novel sketch-to-geometry loss that preserves delicate sketch features. A texture-control module leverages CLIP-based text-image matching to select textures from a library and fuse them onto the UV-mapped geometry, using PCA albedo to fill occluded regions. The framework operates without 3D scans, using 2D supervisory signals (landmarks, segmentation) and differentiable rendering to supervise both coarse and fine geometry across diverse sketch styles. Extensive experiments on the Sketch-REALY benchmark show state-of-the-art geometry accuracy and high-quality, controllable textures, with practical applications in avatars, animation, and missing-person search. The approach highlights the importance of aligning geometry directly with sketch features while enabling expressive texture variation via natural language prompts.

Abstract

3D textured face reconstruction from sketches applicable in many scenarios such as animation, 3D avatars, artistic design, missing people search, etc., is a highly promising but underdeveloped research topic. On the one hand, the stylistic diversity of sketches leads to existing sketch-to-3D-face methods only being able to handle pose-limited and realistically shaded sketches. On the other hand, texture plays a vital role in representing facial appearance, yet sketches lack this information, necessitating additional texture control in the reconstruction process. This paper proposes a novel method for reconstructing controllable textured and detailed 3D faces from sketches, named S2TD-Face. S2TD-Face introduces a two-stage geometry reconstruction framework that directly reconstructs detailed geometry from the input sketch. To keep geometry consistent with the delicate strokes of the sketch, we propose a novel sketch-to-geometry loss that ensures the reconstruction accurately fits the input features like dimples and wrinkles. Our training strategies do not rely on hard-to-obtain 3D face scanning data or labor-intensive hand-drawn sketches. Furthermore, S2TD-Face introduces a texture control module utilizing text prompts to select the most suitable textures from a library and seamlessly integrate them into the geometry, resulting in a 3D detailed face with controllable texture. S2TD-Face surpasses existing state-of-the-art methods in extensive quantitative and qualitative experiments. Our project is available at https://github.com/wang-zidu/S2TD-Face .

S2TD-Face: Reconstruct a Detailed 3D Face with Controllable Texture from a Single Sketch

TL;DR

S2TD-Face tackles sketch-to-3D-face reconstruction by introducing a two-stage geometry pipeline that first predicts coarse 3DMM-based geometry and then refines details with a UV-space displacement map, guided by a novel sketch-to-geometry loss that preserves delicate sketch features. A texture-control module leverages CLIP-based text-image matching to select textures from a library and fuse them onto the UV-mapped geometry, using PCA albedo to fill occluded regions. The framework operates without 3D scans, using 2D supervisory signals (landmarks, segmentation) and differentiable rendering to supervise both coarse and fine geometry across diverse sketch styles. Extensive experiments on the Sketch-REALY benchmark show state-of-the-art geometry accuracy and high-quality, controllable textures, with practical applications in avatars, animation, and missing-person search. The approach highlights the importance of aligning geometry directly with sketch features while enabling expressive texture variation via natural language prompts.

Abstract

3D textured face reconstruction from sketches applicable in many scenarios such as animation, 3D avatars, artistic design, missing people search, etc., is a highly promising but underdeveloped research topic. On the one hand, the stylistic diversity of sketches leads to existing sketch-to-3D-face methods only being able to handle pose-limited and realistically shaded sketches. On the other hand, texture plays a vital role in representing facial appearance, yet sketches lack this information, necessitating additional texture control in the reconstruction process. This paper proposes a novel method for reconstructing controllable textured and detailed 3D faces from sketches, named S2TD-Face. S2TD-Face introduces a two-stage geometry reconstruction framework that directly reconstructs detailed geometry from the input sketch. To keep geometry consistent with the delicate strokes of the sketch, we propose a novel sketch-to-geometry loss that ensures the reconstruction accurately fits the input features like dimples and wrinkles. Our training strategies do not rely on hard-to-obtain 3D face scanning data or labor-intensive hand-drawn sketches. Furthermore, S2TD-Face introduces a texture control module utilizing text prompts to select the most suitable textures from a library and seamlessly integrate them into the geometry, resulting in a 3D detailed face with controllable texture. S2TD-Face surpasses existing state-of-the-art methods in extensive quantitative and qualitative experiments. Our project is available at https://github.com/wang-zidu/S2TD-Face .
Paper Structure (19 sections, 14 equations, 10 figures, 3 tables)

This paper contains 19 sections, 14 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: S2TD-Face can reconstruct high-fidelity geometry from face sketches. The texture control module seamlessly applies suitable textures onto the geometry based on prompts. The results can be re-lighted for various application scenes.
  • Figure 2: Data samples of S2TD-Face. (a)-(e) are sketches in different styles generated from the original image (f). (g) represents landmarks, and (h) represents segmentation. Inputs of the pipeline include sketches (a)-(e) and (f)-(h) serve as supervisory signals.
  • Figure 3: Overview of our method. (a) The input of S2TD-Face: a face sketch and a text prompt. (b): The geometry reconstruction framework yields detailed 3D faces that accurately reflects the delicate features of the input sketches. (c): The texture control module seamlessly applies the controllable texture to the geometry with text prompts. (d) The output of S2TD-Face: a detailed 3D face with controllable texture.
  • Figure 4: Overview of sketch-to-geometry loss. ${\mathcal{L}_\mathrm{sketch}}$ compares the predicted sketches $\{{\bm{S}}_{t_j}^a,{\bm{S}}_{t_j}^b,{\bm{S}}_{t_j}^c,{\bm{S}}_{t_j}^d\}$ with the ground truth ${\bm{S}}_{t_j}$ to supervise the geometry deformation, obtaining detailed geometry consistent with the delicate features of the input.
  • Figure 5: The test samples ($\bm{7/100}$) of Sketch-REALY. (a): The original test images from REALY chai2022realy. (b) and (c): The $\bm{2}$ styles (Shading and Line) of the test images in Sketch-REALY. (d): The face scanning for geometry evaluation in Sketch-REALY.
  • ...and 5 more figures