Table of Contents
Fetching ...

LAFR: Efficient Diffusion-based Blind Face Restoration via Latent Codebook Alignment Adapter

Runyi Li, Bin Chen, Jian Zhang, Radu Timofte

TL;DR

The paper tackles the problem of latent-space misalignment in diffusion-based blind face restoration, where HQ-trained VAE encoders misinterpret severely degraded inputs. It introduces LAFR, a lightweight Latent Codebook Alignment Adapter that aligns LQ latents to the HQ latent distribution using a learned codebook, coupled with a two-stage framework that fine-tunes only a small portion of the model via LoRA while freezing the VAE. A multi-level restoration loss incorporating appearance, identity, and facial structure priors preserves identity and realism without conditioning on degraded inputs. Data-efficient adaptation is demonstrated: only 600 training images (~0.9% of FFHQ) and 7.5M trainable parameters are needed, achieving performance on par with state-of-the-art methods and offering a 70% reduction in training time, with strong results on both synthetic and real-world benchmarks.

Abstract

Blind face restoration from low-quality (LQ) images is a challenging task that requires not only high-fidelity image reconstruction but also the preservation of facial identity. While diffusion models like Stable Diffusion have shown promise in generating high-quality (HQ) images, their VAE modules are typically trained only on HQ data, resulting in semantic misalignment when encoding LQ inputs. This mismatch significantly weakens the effectiveness of LQ conditions during the denoising process. Existing approaches often tackle this issue by retraining the VAE encoder, which is computationally expensive and memory-intensive. To address this limitation efficiently, we propose LAFR (Latent Alignment for Face Restoration), a novel codebook-based latent space adapter that aligns the latent distribution of LQ images with that of HQ counterparts, enabling semantically consistent diffusion sampling without altering the original VAE. To further enhance identity preservation, we introduce a multi-level restoration loss that combines constraints from identity embeddings and facial structural priors. Additionally, by leveraging the inherent structural regularity of facial images, we show that lightweight finetuning of diffusion prior on just 0.9% of FFHQ dataset is sufficient to achieve results comparable to state-of-the-art methods, reduce training time by 70%. Extensive experiments on both synthetic and real-world face restoration benchmarks demonstrate the effectiveness and efficiency of LAFR, achieving high-quality, identity-preserving face reconstruction from severely degraded inputs.

LAFR: Efficient Diffusion-based Blind Face Restoration via Latent Codebook Alignment Adapter

TL;DR

The paper tackles the problem of latent-space misalignment in diffusion-based blind face restoration, where HQ-trained VAE encoders misinterpret severely degraded inputs. It introduces LAFR, a lightweight Latent Codebook Alignment Adapter that aligns LQ latents to the HQ latent distribution using a learned codebook, coupled with a two-stage framework that fine-tunes only a small portion of the model via LoRA while freezing the VAE. A multi-level restoration loss incorporating appearance, identity, and facial structure priors preserves identity and realism without conditioning on degraded inputs. Data-efficient adaptation is demonstrated: only 600 training images (~0.9% of FFHQ) and 7.5M trainable parameters are needed, achieving performance on par with state-of-the-art methods and offering a 70% reduction in training time, with strong results on both synthetic and real-world benchmarks.

Abstract

Blind face restoration from low-quality (LQ) images is a challenging task that requires not only high-fidelity image reconstruction but also the preservation of facial identity. While diffusion models like Stable Diffusion have shown promise in generating high-quality (HQ) images, their VAE modules are typically trained only on HQ data, resulting in semantic misalignment when encoding LQ inputs. This mismatch significantly weakens the effectiveness of LQ conditions during the denoising process. Existing approaches often tackle this issue by retraining the VAE encoder, which is computationally expensive and memory-intensive. To address this limitation efficiently, we propose LAFR (Latent Alignment for Face Restoration), a novel codebook-based latent space adapter that aligns the latent distribution of LQ images with that of HQ counterparts, enabling semantically consistent diffusion sampling without altering the original VAE. To further enhance identity preservation, we introduce a multi-level restoration loss that combines constraints from identity embeddings and facial structural priors. Additionally, by leveraging the inherent structural regularity of facial images, we show that lightweight finetuning of diffusion prior on just 0.9% of FFHQ dataset is sufficient to achieve results comparable to state-of-the-art methods, reduce training time by 70%. Extensive experiments on both synthetic and real-world face restoration benchmarks demonstrate the effectiveness and efficiency of LAFR, achieving high-quality, identity-preserving face reconstruction from severely degraded inputs.

Paper Structure

This paper contains 19 sections, 3 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: Comparison of our proposed LAFR with state-of-the-art face restoration methods. (a) Performance and efficiency across different methods. Bubble size reflects the amount of learnable parameters. Our LAFR achieves the best balance, delivering superior quality with minimal computational cost and parameter count. (b) Quantitative comparison across multiple metrics. LAFR outperforms other methods, demonstrating its effectiveness. All metrics are normalized, and for metrics where lower values indicate better performance, we take their reciprocal to ensure consistent visual interpretation across all axes. * means re-trained on FFHQ karras2019style.
  • Figure 2: Network Structure of our proposed latent codebook alignment adapter.
  • Figure 3: Usage in ours SR pipeline.
  • Figure 4: Alignment module in FaithDiffchen2024faithdiff and their SR pipeline.
  • Figure 5: Re-trained VRE in OSDFacewang2025osdface.
  • ...and 8 more figures