Table of Contents
Fetching ...

A2BFR: Attribute-Aware Blind Face Restoration

Chenxin Zhu, Yushun Fang, Lu Liu, Shibo Yin, Xiaohong Liu, Xiaoyun Zhang, Qiang Hu, Guangtao Zhai

Abstract

Blind face restoration (BFR) aims to recover high-quality facial images from degraded inputs, yet its inherently ill-posed nature leads to ambiguous and uncontrollable solutions. Recent diffusion-based BFR methods improve perceptual quality but remain uncontrollable, whereas text-guided face editing enables attribute manipulation without reliable restoration. To address these issues, we propose A$^2$BFR, an attribute-aware blind face restoration framework that unifies high-fidelity reconstruction with prompt-controllable generation. Built upon a Diffusion Transformer backbone with unified image-text cross-modal attention, A$^2$BFR jointly conditions the denoising trajectory on both degraded inputs and textual prompts. To inject semantic priors, we introduce attribute-aware learning, which supervises denoising latents using facial attribute embeddings extracted by an attribute-aware encoder. To further enhance prompt controllability, we introduce semantic dual-training, which leverages the pairwise attribute variations in our newly curated AttrFace-90K dataset to enforce attribute discrimination while preserving fidelity. Extensive experiments demonstrate that A$^2$BFR achieves state-of-the-art performance in both restoration fidelity and instruction adherence, outperforming diffusion-based BFR baselines by -0.0467 LPIPS and +52.58% attribute accuracy, while enabling fine-grained, prompt-controllable restoration even under severe degradations.

A2BFR: Attribute-Aware Blind Face Restoration

Abstract

Blind face restoration (BFR) aims to recover high-quality facial images from degraded inputs, yet its inherently ill-posed nature leads to ambiguous and uncontrollable solutions. Recent diffusion-based BFR methods improve perceptual quality but remain uncontrollable, whereas text-guided face editing enables attribute manipulation without reliable restoration. To address these issues, we propose ABFR, an attribute-aware blind face restoration framework that unifies high-fidelity reconstruction with prompt-controllable generation. Built upon a Diffusion Transformer backbone with unified image-text cross-modal attention, ABFR jointly conditions the denoising trajectory on both degraded inputs and textual prompts. To inject semantic priors, we introduce attribute-aware learning, which supervises denoising latents using facial attribute embeddings extracted by an attribute-aware encoder. To further enhance prompt controllability, we introduce semantic dual-training, which leverages the pairwise attribute variations in our newly curated AttrFace-90K dataset to enforce attribute discrimination while preserving fidelity. Extensive experiments demonstrate that ABFR achieves state-of-the-art performance in both restoration fidelity and instruction adherence, outperforming diffusion-based BFR baselines by -0.0467 LPIPS and +52.58% attribute accuracy, while enabling fine-grained, prompt-controllable restoration even under severe degradations.

Paper Structure

This paper contains 16 sections, 6 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Left: Prompt-guided specific restoration avoids uncontrolled solutions in ill-posed BFR. Middle: Compared with restore-only and restore-then-edit pipelines, A$^\text{2}$BFR achieves faithful yet attribute-aligned restoration. Right: Quantitative results show that A$^\text{2}$BFR simultaneously achieves superior fidelity and attribute accuracy, outperforming all compared restoration and editing methods.
  • Figure 2: AttrFace-90K overview. (a) Data construction pipeline: starting from HQ faces (FFHQ/ReFace-HQ/CelebA-HQ), we extract features, classify attributes, build prompts with Qwen3, and perform attribute editing with FlowEdit+SD3.5, followed by post-checks. (b) Attribute distribution: percentage of each controllable attribute across 90K pairs. (c) Image pair examples: source–target pairs for typical attributes.
  • Figure 3: Overview of our $\text{A}^\text{2}$BFR. Left: The semantic dual-training (SDT) strategy pairs a single LQ input with two GT images driven by different attribute prompts, and enforces maximal separation between their latent representations in the semantic space. This encourages the model to learn attribute-discriminative, prompt-aligned behaviors. Right: At inference time, $\text{A}^\text{2}$BFR takes an LQ image and a user-specified prompt, and generates an attribute-aligned, high-quality restoration through denoising process.
  • Figure 4: Qualitative comparison of restoration quality on CelebRef-HQ-Test. A$^\text{2}$BFR restores sharper textures and more natural facial details while preserving better fidelity, surpassing diffusion- and diffusion-based baselines.
  • Figure 5: Qualitative comparison of attribute alignment on AttrFace-90K-Test. The top row shows one-stage restoration results, while the second and third rows display the results of the two-stage restore-then-edit pipeline. A$^\text{2}$BFR (bottom) achieves better visual quality with accurate attribute control under the provided prompts.
  • ...and 2 more figures