Table of Contents
Fetching ...

Sketch-1-to-3: One Single Sketch to 3D Detailed Face Reconstruction

Liting Wen, Zimo Yang, Xianlin Zhang, Chi Ding, Mingdao Wang, Xueming Li

TL;DR

This work tackles the challenging task of reconstructing high-fidelity 3D faces from a single hand-drawn sketch by introducing Sketch-1-to-3, a two-stage, FLAME-based framework that directly transfers information from 2D sketches to 3D space. A key contribution is the Geometric Contour and Texture Detail (GCTD) module, which enhances contour detection and fine-detail preservation during both coarse and detail reconstruction stages, aided by a domain-adaptive learning strategy. To address data scarcity and domain gap, the authors release SketchFaces (real sketches) and Syn-SketchFaces (synthetic sketches) and employ inter-layer feature-statistics mixing to bridge synthetic-real sketch distributions. Quantitative and qualitative results demonstrate state-of-the-art sketch-to-3D reconstruction performance, with robust performance under occlusions and diverse sketch styles, complemented by user studies validating practical utility. These advances enable realistic, sketch-driven 3D facial modeling for applications in animation, game development, and digital humans.

Abstract

3D face reconstruction from a single sketch is a critical yet underexplored task with significant practical applications. The primary challenges stem from the substantial modality gap between 2D sketches and 3D facial structures, including: (1) accurately extracting facial keypoints from 2D sketches; (2) preserving diverse facial expressions and fine-grained texture details; and (3) training a high-performing model with limited data. In this paper, we propose Sketch-1-to-3, a novel framework for realistic 3D face reconstruction from a single sketch, to address these challenges. Specifically, we first introduce the Geometric Contour and Texture Detail (GCTD) module, which enhances the extraction of geometric contours and texture details from facial sketches. Additionally, we design a deep learning architecture with a domain adaptation module and a tailored loss function to align sketches with the 3D facial space, enabling high-fidelity expression and texture reconstruction. To facilitate evaluation and further research, we construct SketchFaces, a real hand-drawn facial sketch dataset, and Syn-SketchFaces, a synthetic facial sketch dataset. Extensive experiments demonstrate that Sketch-1-to-3 achieves state-of-the-art performance in sketch-based 3D face reconstruction.

Sketch-1-to-3: One Single Sketch to 3D Detailed Face Reconstruction

TL;DR

This work tackles the challenging task of reconstructing high-fidelity 3D faces from a single hand-drawn sketch by introducing Sketch-1-to-3, a two-stage, FLAME-based framework that directly transfers information from 2D sketches to 3D space. A key contribution is the Geometric Contour and Texture Detail (GCTD) module, which enhances contour detection and fine-detail preservation during both coarse and detail reconstruction stages, aided by a domain-adaptive learning strategy. To address data scarcity and domain gap, the authors release SketchFaces (real sketches) and Syn-SketchFaces (synthetic sketches) and employ inter-layer feature-statistics mixing to bridge synthetic-real sketch distributions. Quantitative and qualitative results demonstrate state-of-the-art sketch-to-3D reconstruction performance, with robust performance under occlusions and diverse sketch styles, complemented by user studies validating practical utility. These advances enable realistic, sketch-driven 3D facial modeling for applications in animation, game development, and digital humans.

Abstract

3D face reconstruction from a single sketch is a critical yet underexplored task with significant practical applications. The primary challenges stem from the substantial modality gap between 2D sketches and 3D facial structures, including: (1) accurately extracting facial keypoints from 2D sketches; (2) preserving diverse facial expressions and fine-grained texture details; and (3) training a high-performing model with limited data. In this paper, we propose Sketch-1-to-3, a novel framework for realistic 3D face reconstruction from a single sketch, to address these challenges. Specifically, we first introduce the Geometric Contour and Texture Detail (GCTD) module, which enhances the extraction of geometric contours and texture details from facial sketches. Additionally, we design a deep learning architecture with a domain adaptation module and a tailored loss function to align sketches with the 3D facial space, enabling high-fidelity expression and texture reconstruction. To facilitate evaluation and further research, we construct SketchFaces, a real hand-drawn facial sketch dataset, and Syn-SketchFaces, a synthetic facial sketch dataset. Extensive experiments demonstrate that Sketch-1-to-3 achieves state-of-the-art performance in sketch-based 3D face reconstruction.

Paper Structure

This paper contains 34 sections, 11 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: A user draws a facial sketch (left), and our Sketch-1-to-3 system produces a detailed 3D face reconstruction (right) that faithfully preserves geometry and fine details, remaining robust to occlusions and stylistic stroke variations.
  • Figure 2: Challenge demonstration. The yellow line denotes a sketch-photo-3D pipeline using lin2023sketchfacenerf and feng2021learning, while the green line represents our method. The indirect sketch-photo-3D process tends to introduce reconstruction bias.
  • Figure 3: An overview of the proposed method, which consists of coarse and detail training stages. In the coarse stage, a shape encoder $E_{s}$ regresses $\bm{\beta}$ and a coarse encoder $E_{c}$ regresses $\bm{\theta}$, $\bm{\psi}$, $\bm{l}$, $\bm{c}$, and $\bm{\alpha}$. In the detail stage, a detail encoder $E_{d}$ generates a latent code $\bm{\delta}$ to refine the coarse 3D face with fine details. The 3D faces from both stages are rendered into 2D and compared with the input sketch to compute reconstruction losses. A GCTD module is applied in both stages to enhance feature extraction. All encoders employ inter-layer feature mixing for domain adaptation.
  • Figure 4: (a) Syn-SketchFaces generation process: from original photos to PiDiNet contours to synthetic sketches. (b) Real sketch samples from the SketchFaces dataset.
  • Figure 5: Qualitative comparisons. The left half shows reconstructions from synthetic sketches, and the right half from real hand-drawn sketches. Each group of five columns includes (from left to right): (1) input sketch, (2) DeepSketch2Facehan2017deepsketch2face, (3) SketchFaceNeRFlin2023sketchfacenerf, (4) the combination oflin2023sketchfacenerf andfeng2021learning, and (5) our Sketch-1-to-3.
  • ...and 5 more figures