Table of Contents
Fetching ...

DifFace: Blind Face Restoration with Diffused Error Contraction

Zongsheng Yue, Chen Change Loy

TL;DR

DifFace tackles blind face restoration under unknown, severe degradations by reimagining restoration as posterior inference using a transition to an intermediate diffused state. It leverages a pretrained diffusion model as a powerful image prior, while learning a Gaussian transition p( x_N | y_0 ) through a diffused estimator trained with $L_1$ loss, enabling an efficient, robust reconstruction via DDIM sampling with controlled randomness. The method achieves state-of-the-art or competitive results across synthetic and real-world datasets for BFR, and extends naturally to inpainting and other restoration tasks, offering a simple training pipeline with strong robustness due to error contraction. A key advantage is producing multiple plausible HQ outputs by sampling with different seeds, reflecting the inherent ambiguity in ill-posed restoration problems. While diffusion-based inference imposes computational costs, the framework demonstrates practical efficacy and flexibility for real-world face restoration applications.

Abstract

While deep learning-based methods for blind face restoration have achieved unprecedented success, they still suffer from two major limitations. First, most of them deteriorate when facing complex degradations out of their training data. Second, these methods require multiple constraints, e.g., fidelity, perceptual, and adversarial losses, which require laborious hyper-parameter tuning to stabilize and balance their influences. In this work, we propose a novel method named DifFace that is capable of coping with unseen and complex degradations more gracefully without complicated loss designs. The key of our method is to establish a posterior distribution from the observed low-quality (LQ) image to its high-quality (HQ) counterpart. In particular, we design a transition distribution from the LQ image to the intermediate state of a pre-trained diffusion model and then gradually transmit from this intermediate state to the HQ target by recursively applying a pre-trained diffusion model. The transition distribution only relies on a restoration backbone that is trained with $L_2$ loss on some synthetic data, which favorably avoids the cumbersome training process in existing methods. Moreover, the transition distribution can contract the error of the restoration backbone and thus makes our method more robust to unknown degradations. Comprehensive experiments show that DifFace is superior to current state-of-the-art methods, especially in cases with severe degradations. Code and model are available at https://github.com/zsyOAOA/DifFace.

DifFace: Blind Face Restoration with Diffused Error Contraction

TL;DR

DifFace tackles blind face restoration under unknown, severe degradations by reimagining restoration as posterior inference using a transition to an intermediate diffused state. It leverages a pretrained diffusion model as a powerful image prior, while learning a Gaussian transition p( x_N | y_0 ) through a diffused estimator trained with loss, enabling an efficient, robust reconstruction via DDIM sampling with controlled randomness. The method achieves state-of-the-art or competitive results across synthetic and real-world datasets for BFR, and extends naturally to inpainting and other restoration tasks, offering a simple training pipeline with strong robustness due to error contraction. A key advantage is producing multiple plausible HQ outputs by sampling with different seeds, reflecting the inherent ambiguity in ill-posed restoration problems. While diffusion-based inference imposes computational costs, the framework demonstrates practical efficacy and flexibility for real-world face restoration applications.

Abstract

While deep learning-based methods for blind face restoration have achieved unprecedented success, they still suffer from two major limitations. First, most of them deteriorate when facing complex degradations out of their training data. Second, these methods require multiple constraints, e.g., fidelity, perceptual, and adversarial losses, which require laborious hyper-parameter tuning to stabilize and balance their influences. In this work, we propose a novel method named DifFace that is capable of coping with unseen and complex degradations more gracefully without complicated loss designs. The key of our method is to establish a posterior distribution from the observed low-quality (LQ) image to its high-quality (HQ) counterpart. In particular, we design a transition distribution from the LQ image to the intermediate state of a pre-trained diffusion model and then gradually transmit from this intermediate state to the HQ target by recursively applying a pre-trained diffusion model. The transition distribution only relies on a restoration backbone that is trained with loss on some synthetic data, which favorably avoids the cumbersome training process in existing methods. Moreover, the transition distribution can contract the error of the restoration backbone and thus makes our method more robust to unknown degradations. Comprehensive experiments show that DifFace is superior to current state-of-the-art methods, especially in cases with severe degradations. Code and model are available at https://github.com/zsyOAOA/DifFace.
Paper Structure (28 sections, 18 equations, 18 figures, 11 tables, 1 algorithm)

This paper contains 28 sections, 18 equations, 18 figures, 11 tables, 1 algorithm.

Figures (18)

  • Figure 1: Overview of the proposed method. The solid lines denote the whole inference pipeline of our method. For ease of comparison, we also mark out the forward and reverse processes of the diffusion model by dotted lines.
  • Figure 2: Comparative results of the proposed method to recent state-of-the-art approaches on the task of blind face restoration (top row) and face image inpainting (bottom row). Note that the masked area is highlighted using a purple color in the example of face inpainting.
  • Figure 3: Illustration of the diffused $\bm{x}_N$ (top row) and the reconstructed results (bottom row) by a pretrained diffusion model from different starting timesteps. Note that the employed diffusion model is trained with 1000 discrete steps following dhariwal2021diffusion.
  • Figure 4: The curves of $\kappa_N$ and $\alpha_N$ with the starting timestep $N$.
  • Figure 5: Comparison of existing deep learning-based approaches and our proposed method. For the former, we adopt $g(\cdot;\theta)$ to denote the learnable neural network with parameter $\theta$.
  • ...and 13 more figures