Table of Contents
Fetching ...

BFRFormer: Transformer-based generator for Real-World Blind Face Restoration

Guojing Ge, Qi Song, Guibo Zhu, Yuting Zhang, Jinglu Chen, Miao Xin, Ming Tang, Jinqiao Wang

TL;DR

The paper addresses blind face restoration under unknown real-world degradations by introducing BFRFormer, a Transformer-based generator embedded in a GAN-prior framework to model long-range dependencies and mitigate over-smoothing. It couples a simple CNN encoder with a style-informed Transformer generator that leverages an Aggregated Attention Module (CAB plus Double Attention) and a wavelet discriminator to suppress artifacts, aided by spectral normalization and balanced consistency regulation for stability. Key contributions include (1) a Transformer-based training method within the GAN-prior paradigm, (2) the novel Aggregated Attention Module to activate effective pixels, (3) a large, diverse real-world test benchmark, and (4) extensive experiments showing state-of-the-art results on synthetic and real-world datasets. The approach advances practical blind face restoration by improving identity preservation and detail fidelity, with public code and models enabling broader adoption.

Abstract

Blind face restoration is a challenging task due to the unknown and complex degradation. Although face prior-based methods and reference-based methods have recently demonstrated high-quality results, the restored images tend to contain over-smoothed results and lose identity-preserved details when the degradation is severe. It is observed that this is attributed to short-range dependencies, the intrinsic limitation of convolutional neural networks. To model long-range dependencies, we propose a Transformer-based blind face restoration method, named BFRFormer, to reconstruct images with more identity-preserved details in an end-to-end manner. In BFRFormer, to remove blocking artifacts, the wavelet discriminator and aggregated attention module are developed, and spectral normalization and balanced consistency regulation are adaptively applied to address the training instability and over-fitting problem, respectively. Extensive experiments show that our method outperforms state-of-the-art methods on a synthetic dataset and four real-world datasets. The source code, Casia-Test dataset, and pre-trained models are released at https://github.com/s8Znk/BFRFormer.

BFRFormer: Transformer-based generator for Real-World Blind Face Restoration

TL;DR

The paper addresses blind face restoration under unknown real-world degradations by introducing BFRFormer, a Transformer-based generator embedded in a GAN-prior framework to model long-range dependencies and mitigate over-smoothing. It couples a simple CNN encoder with a style-informed Transformer generator that leverages an Aggregated Attention Module (CAB plus Double Attention) and a wavelet discriminator to suppress artifacts, aided by spectral normalization and balanced consistency regulation for stability. Key contributions include (1) a Transformer-based training method within the GAN-prior paradigm, (2) the novel Aggregated Attention Module to activate effective pixels, (3) a large, diverse real-world test benchmark, and (4) extensive experiments showing state-of-the-art results on synthetic and real-world datasets. The approach advances practical blind face restoration by improving identity preservation and detail fidelity, with public code and models enabling broader adoption.

Abstract

Blind face restoration is a challenging task due to the unknown and complex degradation. Although face prior-based methods and reference-based methods have recently demonstrated high-quality results, the restored images tend to contain over-smoothed results and lose identity-preserved details when the degradation is severe. It is observed that this is attributed to short-range dependencies, the intrinsic limitation of convolutional neural networks. To model long-range dependencies, we propose a Transformer-based blind face restoration method, named BFRFormer, to reconstruct images with more identity-preserved details in an end-to-end manner. In BFRFormer, to remove blocking artifacts, the wavelet discriminator and aggregated attention module are developed, and spectral normalization and balanced consistency regulation are adaptively applied to address the training instability and over-fitting problem, respectively. Extensive experiments show that our method outperforms state-of-the-art methods on a synthetic dataset and four real-world datasets. The source code, Casia-Test dataset, and pre-trained models are released at https://github.com/s8Znk/BFRFormer.
Paper Structure (15 sections, 8 equations, 2 figures, 3 tables)

This paper contains 15 sections, 8 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview of BFRFormer framework. (a) It follows an encoder and generator architecture; (b) TB is a transform-based block of the generator; (c) Aggregated Attention Module (AAM) which combing channel attention extracting global information with double-attention extracting local information, to activate more input pixels for face restoration.
  • Figure 2: Comparison of our variants of BFRFormer. (a) Low-quality input; (b) Perceptual Loss; (c) bCR; (d) CAB; (e) Facial Component; (f) Ground Truth.