Table of Contents
Fetching ...

AuthFace: Towards Authentic Blind Face Restoration with Face-oriented Generative Diffusion Prior

Guoqiang Liang, Qingnan Fan, Bingtao Fu, Jinwei Chen, Hong Gu, Lin Wang

TL;DR

This paper proposes a novel framework, namely AuthFace, that achieves highly authentic face restoration results by exploring a face-oriented generative diffusion prior, and introduces a novel face-oriented restoration-tuning pipeline that fine-tunes a pretrained T2I model.

Abstract

Blind face restoration (BFR) is a fundamental and challenging problem in computer vision. To faithfully restore high-quality (HQ) photos from poor-quality ones, recent research endeavors predominantly rely on facial image priors from the powerful pretrained text-to-image (T2I) diffusion models. However, such priors often lead to the incorrect generation of non-facial features and insufficient facial details, thus rendering them less practical for real-world applications. In this paper, we propose a novel framework, namely AuthFace that achieves highly authentic face restoration results by exploring a face-oriented generative diffusion prior. To learn such a prior, we first collect a dataset of 1.5K high-quality images, with resolutions exceeding 8K, captured by professional photographers. Based on the dataset, we then introduce a novel face-oriented restoration-tuning pipeline that fine-tunes a pretrained T2I model. Identifying key criteria of quality-first and photography-guided annotation, we involve the retouching and reviewing process under the guidance of photographers for high-quality images that show rich facial features. The photography-guided annotation system fully explores the potential of these high-quality photographic images. In this way, the potent natural image priors from pretrained T2I diffusion models can be subtly harnessed, specifically enhancing their capability in facial detail restoration. Moreover, to minimize artifacts in critical facial areas, such as eyes and mouth, we propose a time-aware latent facial feature loss to learn the authentic face restoration process. Extensive experiments on the synthetic and real-world BFR datasets demonstrate the superiority of our approach.

AuthFace: Towards Authentic Blind Face Restoration with Face-oriented Generative Diffusion Prior

TL;DR

This paper proposes a novel framework, namely AuthFace, that achieves highly authentic face restoration results by exploring a face-oriented generative diffusion prior, and introduces a novel face-oriented restoration-tuning pipeline that fine-tunes a pretrained T2I model.

Abstract

Blind face restoration (BFR) is a fundamental and challenging problem in computer vision. To faithfully restore high-quality (HQ) photos from poor-quality ones, recent research endeavors predominantly rely on facial image priors from the powerful pretrained text-to-image (T2I) diffusion models. However, such priors often lead to the incorrect generation of non-facial features and insufficient facial details, thus rendering them less practical for real-world applications. In this paper, we propose a novel framework, namely AuthFace that achieves highly authentic face restoration results by exploring a face-oriented generative diffusion prior. To learn such a prior, we first collect a dataset of 1.5K high-quality images, with resolutions exceeding 8K, captured by professional photographers. Based on the dataset, we then introduce a novel face-oriented restoration-tuning pipeline that fine-tunes a pretrained T2I model. Identifying key criteria of quality-first and photography-guided annotation, we involve the retouching and reviewing process under the guidance of photographers for high-quality images that show rich facial features. The photography-guided annotation system fully explores the potential of these high-quality photographic images. In this way, the potent natural image priors from pretrained T2I diffusion models can be subtly harnessed, specifically enhancing their capability in facial detail restoration. Moreover, to minimize artifacts in critical facial areas, such as eyes and mouth, we propose a time-aware latent facial feature loss to learn the authentic face restoration process. Extensive experiments on the synthetic and real-world BFR datasets demonstrate the superiority of our approach.

Paper Structure

This paper contains 15 sections, 4 equations, 16 figures, 6 tables.

Figures (16)

  • Figure 1: Compared with the results from a state-of-the-art (SOTA) method SUPIR yu2024scaling using StableDiffusion-XL (SDXL) podell2023sdxl as prior, our approach excels in capturing and rendering intricate facial details. For instance, our result has a more distinct jawline (see blue arrow) in the 2 row, effectively distinguishing the jaw from the neck. Zoom in for more details.
  • Figure 2: (a) A HQ face image with its paired tags generated through photography-guided image annotation. Specifically, we provide an additional photographic tag (blue box) beyond the semantic tags used in previous methods (gray box). (b) Qualitative comparison between StableDiffusion-XL (SDXL) podell2023sdxl and our fine-tuned model, which is exclusively trained on the collected high-quality dataset, in the T2I task. Notably, SDXL tends to generate over-smooth skin even when given prompts specifying sharp details and sharp focus. Zoom in for more details.
  • Figure 3: The framework of face-oriented tuning.
  • Figure 4: An overview of Stage II. Denoising UNet, carried over from Stage I, maintains its facial priors by freezing its parameters, while ControlNet acts as an adapter for handling degraded inputs.
  • Figure 5: Visualization of the diffusion process at different steps. In the early steps (t = 999 - 599), the main content of the images is predominantly noise, with key facial features obscured. In the later steps (t = 61 - 0), the shapes of key facial features become fixed, with minimal changes.
  • ...and 11 more figures