Table of Contents
Fetching ...

DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration

Nan Gao, Jia Li, Huaibo Huang, Zhi Zeng, Ke Shang, Shuwu Zhang, Ran He

TL;DR

DiffMAC tackles the challenge of high-generalization blind face restoration across diverse, out-of-domain degradations by introducing a diffusion-information-diffusion (DID) framework. It couples a Stage I AdaIN-guided diffusion that aligns the low-quality face to a stable HQ manifold with a Stage II manifold information bottleneck (MIB) that compresses restoration-relevant information while injecting identity cues, all operating within a finetuned latent diffusion backbone (Stable Diffusion v2.1) and a shared VAE. The approach is validated on synthetic and real-world datasets, demonstrating superior fidelity and consistency in both photorealistic and heterogeneous domains, and is complemented by ablations and a user study confirming perceptual benefits. The key contributions include (1) the high-generalization DID framework, (2) the novel manifold information bottleneck module, and (3) evidence of competitive performance without reliance on banked priors, enabling robust BFR across diverse scenes with controllable identity preservation. Overall, DiffMAC provides a practical, scalable solution for real-world blind face restoration with improved generalization and interpretability through information-theoretic constraints on the diffusion manifold.

Abstract

Blind face restoration (BFR) is a highly challenging problem due to the uncertainty of degradation patterns. Current methods have low generalization across photorealistic and heterogeneous domains. In this paper, we propose a Diffusion-Information-Diffusion (DID) framework to tackle diffusion manifold hallucination correction (DiffMAC), which achieves high-generalization face restoration in diverse degraded scenes and heterogeneous domains. Specifically, the first diffusion stage aligns the restored face with spatial feature embedding of the low-quality face based on AdaIN, which synthesizes degradation-removal results but with uncontrollable artifacts for some hard cases. Based on Stage I, Stage II considers information compression using manifold information bottleneck (MIB) and finetunes the first diffusion model to improve facial fidelity. DiffMAC effectively fights against blind degradation patterns and synthesizes high-quality faces with attribute and identity consistencies. Experimental results demonstrate the superiority of DiffMAC over state-of-the-art methods, with a high degree of generalization in real-world and heterogeneous settings. The source code and models will be public.

DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration

TL;DR

DiffMAC tackles the challenge of high-generalization blind face restoration across diverse, out-of-domain degradations by introducing a diffusion-information-diffusion (DID) framework. It couples a Stage I AdaIN-guided diffusion that aligns the low-quality face to a stable HQ manifold with a Stage II manifold information bottleneck (MIB) that compresses restoration-relevant information while injecting identity cues, all operating within a finetuned latent diffusion backbone (Stable Diffusion v2.1) and a shared VAE. The approach is validated on synthetic and real-world datasets, demonstrating superior fidelity and consistency in both photorealistic and heterogeneous domains, and is complemented by ablations and a user study confirming perceptual benefits. The key contributions include (1) the high-generalization DID framework, (2) the novel manifold information bottleneck module, and (3) evidence of competitive performance without reliance on banked priors, enabling robust BFR across diverse scenes with controllable identity preservation. Overall, DiffMAC provides a practical, scalable solution for real-world blind face restoration with improved generalization and interpretability through information-theoretic constraints on the diffusion manifold.

Abstract

Blind face restoration (BFR) is a highly challenging problem due to the uncertainty of degradation patterns. Current methods have low generalization across photorealistic and heterogeneous domains. In this paper, we propose a Diffusion-Information-Diffusion (DID) framework to tackle diffusion manifold hallucination correction (DiffMAC), which achieves high-generalization face restoration in diverse degraded scenes and heterogeneous domains. Specifically, the first diffusion stage aligns the restored face with spatial feature embedding of the low-quality face based on AdaIN, which synthesizes degradation-removal results but with uncontrollable artifacts for some hard cases. Based on Stage I, Stage II considers information compression using manifold information bottleneck (MIB) and finetunes the first diffusion model to improve facial fidelity. DiffMAC effectively fights against blind degradation patterns and synthesizes high-quality faces with attribute and identity consistencies. Experimental results demonstrate the superiority of DiffMAC over state-of-the-art methods, with a high degree of generalization in real-world and heterogeneous settings. The source code and models will be public.
Paper Structure (19 sections, 14 equations, 12 figures, 6 tables, 1 algorithm)

This paper contains 19 sections, 14 equations, 12 figures, 6 tables, 1 algorithm.

Figures (12)

  • Figure 1: Our proposed DiffMAC approach achieves less face hallucination brought from deep learning model and possesses better visual performance, compared with other state-of-the-art (CodeFormer codeformer, VQFR vqfr, DifFace difface, FeMaSR femasr, GPEN GPEN, GFPGAN gfp2021, DFDNet dfdnet2020, DiffBIR diffbir) in photorealistic and heterogeneous domains. The subjective study refers to Table \ref{['tab:p2h']}.
  • Figure 2: Our proposed DiffMAC approach achieves more promising BFR results with higher fidelity both in photorealistic and heterogeneous scenarios. More comparisons are illustrated in Fig \ref{['fig:stage2']}, \ref{['fig:photo']}, \ref{['fig:heter']} and \ref{['fig:heterm']}.
  • Figure 3: DID (diffusion-information-diffusion) pipeline of our proposed DiffMAC where the first diffusion model transfers random degraded patterns to a relatively stable restoration mode that is friendly to conduct information bottleneck. Another controllable diffusion model based on the compressed manifold is then leveraged to obtain better BFR. More results are shown in Fig \ref{['fig:stage2']}.
  • Figure 4: DID framework adopts a two-stage strategy for diffusion finetuning to tackle the challenging HG-BFR task, encompassing one AdaIN-based diffusion and another with MIB. We implement diffusion manifold hallucination correction (DiffMAC) from distorted manifold $\mathcal{E}_{X_{LQ}}$ to the optimized manifold $\mathcal{E}_{X_{D1}}$ for Stage I and $\mathcal{E}^{'}(X_{LQ})$, i.e., $Z$ in Algorithm \ref{['alg:alg1']}, for Stage II based on information bottleneck. The compressed manifold with identity injection is then used to accurately control feature transformations of the pre-trained stable diffusion model finetuned by Stage I. QC means $quant\_conv$ layer of the encoder of pre-trained VAE. Overall, the DiffMAC framework effectively achieves high-quality BFR results with robustness in diverse domains.
  • Figure 5: MIB plays an important role in severely degraded scenarios. Stage II demonstrates more natural and cleaner than Stage I beleaguered by facial noises and ambiguous textures. Moreover, CodeFormer codeformer has messy hair (row $2\&3\&5$), and incompatible eyes (row 4$\&$6). GPEN GPEN has more obvious artifacts, and DiffBIR diffbir indicates more blurry imaging, in contrast to Stage II of DiffMAC.
  • ...and 7 more figures