3D Priors-Guided Diffusion for Blind Face Restoration
Xiaobin Lu, Xiaobin Hu, Jun Luo, Ben Zhu, Yaping Ruan, Wenqi Ren
TL;DR
This paper tackles the challenge of blind face restoration by balancing realism and fidelity under diverse degradations. It introduces a diffusion-based framework guided by 3D facial priors, implemented via a two-branch architecture that combines a 3D reconstruction block (SwinIR + 3DMM) with a diffusion branch (U-Net, multi-level 3D features, and a Time-Aware Fusion Block). The method maps structural and identity information from 3D priors into noise estimation and uses time-conditioned fusion to adapt guidance across diffusion steps, optimizing with a diffusion objective and performing 100-step inference. Experimental results on synthetic and real datasets show superior identity preservation and image quality compared to state-of-the-art methods, and the authors provide code for reproducibility.
Abstract
Blind face restoration endeavors to restore a clear face image from a degraded counterpart. Recent approaches employing Generative Adversarial Networks (GANs) as priors have demonstrated remarkable success in this field. However, these methods encounter challenges in achieving a balance between realism and fidelity, particularly in complex degradation scenarios. To inherit the exceptional realism generative ability of the diffusion model and also constrained by the identity-aware fidelity, we propose a novel diffusion-based framework by embedding the 3D facial priors as structure and identity constraints into a denoising diffusion process. Specifically, in order to obtain more accurate 3D prior representations, the 3D facial image is reconstructed by a 3D Morphable Model (3DMM) using an initial restored face image that has been processed by a pretrained restoration network. A customized multi-level feature extraction method is employed to exploit both structural and identity information of 3D facial images, which are then mapped into the noise estimation process. In order to enhance the fusion of identity information into the noise estimation, we propose a Time-Aware Fusion Block (TAFB). This module offers a more efficient and adaptive fusion of weights for denoising, considering the dynamic nature of the denoising process in the diffusion model, which involves initial structure refinement followed by texture detail enhancement. Extensive experiments demonstrate that our network performs favorably against state-of-the-art algorithms on synthetic and real-world datasets for blind face restoration. The Code is released on our project page at https://github.com/838143396/3Diffusion.
