VGG-Tex: A Vivid Geometry-Guided Facial Texture Estimation Model for High Fidelity Monocular 3D Face Reconstruction

Haoyu Wu; Ziqiao Peng; Xukun Zhou; Yunfei Cheng; Jun He; Hongyan Liu; Zhaoxin Fan

VGG-Tex: A Vivid Geometry-Guided Facial Texture Estimation Model for High Fidelity Monocular 3D Face Reconstruction

Haoyu Wu, Ziqiao Peng, Xukun Zhou, Yunfei Cheng, Jun He, Hongyan Liu, Zhaoxin Fan

TL;DR

This work tackles the long-standing gap in monocular 3D face reconstruction where texture quality lags behind geometric accuracy. It introduces VGG-Tex, a geometry-guided texture estimation framework that leverages FLAME-based priors through a dual-branch network (FAEM and CGTG), complemented by a Visibility-Enhanced Texture Completion module and a Texture-Guided Geometry Refinement training stage. The approach yields high-fidelity UV textures and competitive geometry on standard benchmarks, outperforming prior texture-focused methods while maintaining robust geometry reconstruction. These advances enable more realistic renderings for applications in AR/VR, animation, and telepresence by producing more faithful facial textures under varied poses and occlusions.

Abstract

3D face reconstruction from monocular images has promoted the development of various applications such as augmented reality. Though existing methods have made remarkable progress, most of them emphasize geometric reconstruction, while overlooking the importance of texture prediction. To address this issue, we propose VGG-Tex, a novel Vivid Geometry-Guided Facial Texture Estimation model designed for High Fidelity Monocular 3D Face Reconstruction. The core of this approach is leveraging 3D parametric priors to enhance the outcomes of 2D UV texture estimation. Specifically, VGG-Tex includes a Facial Attributes Encoding Module, a Geometry-Guided Texture Generator, and a Visibility-Enhanced Texture Completion Module. These components are responsible for extracting parametric priors, generating initial textures, and refining texture details, respectively. Based on the geometry-texture complementarity principle, VGG-Tex also introduces a Texture-guided Geometry Refinement Module to further balance the overall fidelity of the reconstructed 3D faces, along with corresponding losses. Comprehensive experiments demonstrate that our method significantly improves texture reconstruction performance compared to existing state-of-the-art methods.

VGG-Tex: A Vivid Geometry-Guided Facial Texture Estimation Model for High Fidelity Monocular 3D Face Reconstruction

TL;DR

Abstract

Paper Structure (16 sections, 13 equations, 5 figures, 4 tables)

This paper contains 16 sections, 13 equations, 5 figures, 4 tables.

Introduction
Related work
Geometry Estimation in Monocular 3D Face Reconstruction
Texture Estimation in Monocular 3D Face Reconstruction
Method
Facial Attributes Encoding Module
Geometry-Guided Texture Generator
Visibility-Enhanced Texture Completion Module
Texture-Guided Geometry Refinement training stage
Loss Function
Experiments
Implementation Details
Comparison on Facial Texture Estimation
Comparison on Facial Geometry Reconstruction
Ablations Study
...and 1 more sections

Figures (5)

Figure 1: Intuition of VGG-TEX. A comparison between FFHQ-UV and our method demonstrates a fact that the texture of a 3D face can greatly affect how humans perceive it, even if the geometric details are not very fine.
Figure 2: Illustration of VGG-Tex architecture. VGG-Tex is consisted of a dual-branch architecture. The top branch is a Facial Attributes Encoding Module for latent geometry extractiuon and 3D face geometry prediction; while the bottom branch is a Geometry-Guided Generator that takes the image and geometry guidance as input for UV texture estimation. A During training, the Visibility-Enhanced Texture Completion Module plays a critical role by adding random masks to input images, simulating obscured parts often encountered in wild scenarios.
Figure 3: Illustration of Texture-Guided Geometry Refinement training stage. The procedure initiates with the reconstruction of a 3D mesh and UV texture from a given input image. This is followed by sampling a head pose. The projection of the 3D head model onto the 2D image space, utilizing the sampled challenging pose, culminates in the creation of an augmented input image, denoted as $\mathcal{I}_{r}$. This augmented image is then inputted into the geometry prediction module, which refines the pose and camera parameters by optimizing the 2D landmarks. This optimization allows the model to more effectively accommodate head pose.
Figure 4: Comparison of rendering quality to other texture estimation methods. Our method has the most realistic rendering result and fits into the original image well.
Figure 5: Qualitative ablation study results.

VGG-Tex: A Vivid Geometry-Guided Facial Texture Estimation Model for High Fidelity Monocular 3D Face Reconstruction

TL;DR

Abstract

VGG-Tex: A Vivid Geometry-Guided Facial Texture Estimation Model for High Fidelity Monocular 3D Face Reconstruction

Authors

TL;DR

Abstract

Table of Contents

Figures (5)