Table of Contents
Fetching ...

3D Priors-Guided Diffusion for Blind Face Restoration

Xiaobin Lu, Xiaobin Hu, Jun Luo, Ben Zhu, Yaping Ruan, Wenqi Ren

TL;DR

This paper tackles the challenge of blind face restoration by balancing realism and fidelity under diverse degradations. It introduces a diffusion-based framework guided by 3D facial priors, implemented via a two-branch architecture that combines a 3D reconstruction block (SwinIR + 3DMM) with a diffusion branch (U-Net, multi-level 3D features, and a Time-Aware Fusion Block). The method maps structural and identity information from 3D priors into noise estimation and uses time-conditioned fusion to adapt guidance across diffusion steps, optimizing with a diffusion objective and performing 100-step inference. Experimental results on synthetic and real datasets show superior identity preservation and image quality compared to state-of-the-art methods, and the authors provide code for reproducibility.

Abstract

Blind face restoration endeavors to restore a clear face image from a degraded counterpart. Recent approaches employing Generative Adversarial Networks (GANs) as priors have demonstrated remarkable success in this field. However, these methods encounter challenges in achieving a balance between realism and fidelity, particularly in complex degradation scenarios. To inherit the exceptional realism generative ability of the diffusion model and also constrained by the identity-aware fidelity, we propose a novel diffusion-based framework by embedding the 3D facial priors as structure and identity constraints into a denoising diffusion process. Specifically, in order to obtain more accurate 3D prior representations, the 3D facial image is reconstructed by a 3D Morphable Model (3DMM) using an initial restored face image that has been processed by a pretrained restoration network. A customized multi-level feature extraction method is employed to exploit both structural and identity information of 3D facial images, which are then mapped into the noise estimation process. In order to enhance the fusion of identity information into the noise estimation, we propose a Time-Aware Fusion Block (TAFB). This module offers a more efficient and adaptive fusion of weights for denoising, considering the dynamic nature of the denoising process in the diffusion model, which involves initial structure refinement followed by texture detail enhancement. Extensive experiments demonstrate that our network performs favorably against state-of-the-art algorithms on synthetic and real-world datasets for blind face restoration. The Code is released on our project page at https://github.com/838143396/3Diffusion.

3D Priors-Guided Diffusion for Blind Face Restoration

TL;DR

This paper tackles the challenge of blind face restoration by balancing realism and fidelity under diverse degradations. It introduces a diffusion-based framework guided by 3D facial priors, implemented via a two-branch architecture that combines a 3D reconstruction block (SwinIR + 3DMM) with a diffusion branch (U-Net, multi-level 3D features, and a Time-Aware Fusion Block). The method maps structural and identity information from 3D priors into noise estimation and uses time-conditioned fusion to adapt guidance across diffusion steps, optimizing with a diffusion objective and performing 100-step inference. Experimental results on synthetic and real datasets show superior identity preservation and image quality compared to state-of-the-art methods, and the authors provide code for reproducibility.

Abstract

Blind face restoration endeavors to restore a clear face image from a degraded counterpart. Recent approaches employing Generative Adversarial Networks (GANs) as priors have demonstrated remarkable success in this field. However, these methods encounter challenges in achieving a balance between realism and fidelity, particularly in complex degradation scenarios. To inherit the exceptional realism generative ability of the diffusion model and also constrained by the identity-aware fidelity, we propose a novel diffusion-based framework by embedding the 3D facial priors as structure and identity constraints into a denoising diffusion process. Specifically, in order to obtain more accurate 3D prior representations, the 3D facial image is reconstructed by a 3D Morphable Model (3DMM) using an initial restored face image that has been processed by a pretrained restoration network. A customized multi-level feature extraction method is employed to exploit both structural and identity information of 3D facial images, which are then mapped into the noise estimation process. In order to enhance the fusion of identity information into the noise estimation, we propose a Time-Aware Fusion Block (TAFB). This module offers a more efficient and adaptive fusion of weights for denoising, considering the dynamic nature of the denoising process in the diffusion model, which involves initial structure refinement followed by texture detail enhancement. Extensive experiments demonstrate that our network performs favorably against state-of-the-art algorithms on synthetic and real-world datasets for blind face restoration. The Code is released on our project page at https://github.com/838143396/3Diffusion.
Paper Structure (12 sections, 18 equations, 7 figures, 4 tables)

This paper contains 12 sections, 18 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Results on the Arcface identity score (IDS). The results in the first line show that our method is better consistent with the ground truth (GT) in restoring facial features.
  • Figure 2: Comparison of reconstructing 3D faces from the 3D Morphable Model (3DMM) and ours.
  • Figure 3: The architecture of 3D priors embedded diffusion model. Top: Our framework consists of two parts: the 3D reconstruction block and the denoising diffusion branch. Bottom: The TAFB module fuses 3D features with features extracted by the denoising network.
  • Figure 4: Qualitative comparisons of blind face restoration methods on the CelebA-Test datasetkarras2017progressive. Our method performs better in both identity consistency and structure consistency.
  • Figure 5: Comparisons of blind face restoration methods on the real-world datasets. The results in the first row are from the LFW-Test dataset huang2008labeled, the results in the second row are from the WebPhoto dataset wang2021towards, and the results in the third and fourth rows are from the WIDER-Test dataset yang2016wider.
  • ...and 2 more figures