Table of Contents
Fetching ...

WaveFace: Authentic Face Restoration with Efficient Frequency Recovery

Yunqi Miao, Jiankang Deng, Jungong Han

TL;DR

WaveFace tackles blind face restoration by moving the restoration task into the frequency domain using Discrete Wavelet Transform, restoring the low-frequency content with a conditional diffusion model and recovering high-frequency details with a single-pass U-Net. This division reduces input size to $1/16$ of the original for the diffusion component, enabling substantial speedups, while the high-frequency branch preserves fine textures and identity. The method achieves state-of-the-art authenticity, particularly in identity preservation, and runs roughly 10× faster than diffusion-based BFR methods. Extensive experiments on synthetic and real-world datasets show improved PSNR/SSIM, competitive LPIPS, and lower FID, with ablations confirming the importance of both LCD and HFR components. Limitations include a gap between synthetic and real-world degradations, suggesting future work on more realistic degradation modeling.

Abstract

Although diffusion models are rising as a powerful solution for blind face restoration, they are criticized for two problems: 1) slow training and inference speed, and 2) failure in preserving identity and recovering fine-grained facial details. In this work, we propose WaveFace to solve the problems in the frequency domain, where low- and high-frequency components decomposed by wavelet transformation are considered individually to maximize authenticity as well as efficiency. The diffusion model is applied to recover the low-frequency component only, which presents general information of the original image but 1/16 in size. To preserve the original identity, the generation is conditioned on the low-frequency component of low-quality images at each denoising step. Meanwhile, high-frequency components at multiple decomposition levels are handled by a unified network, which recovers complex facial details in a single step. Evaluations on four benchmark datasets show that: 1) WaveFace outperforms state-of-the-art methods in authenticity, especially in terms of identity preservation, and 2) authentic images are restored with the efficiency 10x faster than existing diffusion model-based BFR methods.

WaveFace: Authentic Face Restoration with Efficient Frequency Recovery

TL;DR

WaveFace tackles blind face restoration by moving the restoration task into the frequency domain using Discrete Wavelet Transform, restoring the low-frequency content with a conditional diffusion model and recovering high-frequency details with a single-pass U-Net. This division reduces input size to of the original for the diffusion component, enabling substantial speedups, while the high-frequency branch preserves fine textures and identity. The method achieves state-of-the-art authenticity, particularly in identity preservation, and runs roughly 10× faster than diffusion-based BFR methods. Extensive experiments on synthetic and real-world datasets show improved PSNR/SSIM, competitive LPIPS, and lower FID, with ablations confirming the importance of both LCD and HFR components. Limitations include a gap between synthetic and real-world degradations, suggesting future work on more realistic degradation modeling.

Abstract

Although diffusion models are rising as a powerful solution for blind face restoration, they are criticized for two problems: 1) slow training and inference speed, and 2) failure in preserving identity and recovering fine-grained facial details. In this work, we propose WaveFace to solve the problems in the frequency domain, where low- and high-frequency components decomposed by wavelet transformation are considered individually to maximize authenticity as well as efficiency. The diffusion model is applied to recover the low-frequency component only, which presents general information of the original image but 1/16 in size. To preserve the original identity, the generation is conditioned on the low-frequency component of low-quality images at each denoising step. Meanwhile, high-frequency components at multiple decomposition levels are handled by a unified network, which recovers complex facial details in a single step. Evaluations on four benchmark datasets show that: 1) WaveFace outperforms state-of-the-art methods in authenticity, especially in terms of identity preservation, and 2) authentic images are restored with the efficiency 10x faster than existing diffusion model-based BFR methods.
Paper Structure (22 sections, 16 equations, 16 figures, 8 tables)

This paper contains 22 sections, 16 equations, 16 figures, 8 tables.

Figures (16)

  • Figure 1: Left: Illustration of our frequency-aware BFR scheme. Restoration is performed in the frequency domain instead of the pixel domain. Right: Comparisons with state-of-the-art face restoration methods on degraded images. Previous methods struggle to restore facial details or the original identity while our WaveFace achieves a good balance of realness and fidelity with fewer artifacts.
  • Figure 2: Overall framework of WaveFace. It consists of a Low-frequency Conditional Denoising (LCD) module and a High-Frequency Refinement (HFR) module. LCD (\ref{['sec:LCD']}) predicts clean samples $\bm{x}_{0}$ from noise conditioned on LQ inputs through $T$ steps. Meanwhile, high-frequency sub-bands are concatenated as HFR (\ref{['sec:HFR']}) inputs to recover vivid facial details. The predicted frequency components are projected back to the image via IWT.
  • Figure 3: Visualization of DWT frequency components and images reconstructed by the low-frequency sub-band of an HQ image and high-frequency sub-bands of its LQ counterpart. DWT level $J$ and resolution of low- / high-frequency sub-bands ($\times N$) are reported.
  • Figure 4: Illustration of high-frequency recovery (HFR) module.
  • Figure 5: Qualitative comparison of different conditioning schemes on CelebA-Test. We adopt "Concat." in WaveFace. PSNR($\uparrow$) / Deg.($\downarrow$) are reported.
  • ...and 11 more figures