Table of Contents
Fetching ...

Dual Associated Encoder for Face Restoration

Yu-Ju Tsai, Yu-Lun Liu, Lu Qi, Kelvin C. K. Chan, Ming-Hsuan Yang

TL;DR

Blind face restoration under severe, unknown degradations is compounded by a domain gap between HQ and LQ images. The proposed DAEFR framework introduces a dedicated auxiliary LQ encoder and a two-stage association-and-fusion strategy to align HQ and LQ representations and predict HQ code indices via a transformer, enabling high-quality restoration with preserved identity. Key contributions include the auxiliary LQ branch, patch-based association inspired by CLIP, and a multi-head cross-attention fusion that improves code-prediction accuracy and output fidelity on both synthetic and real-world datasets. This approach advances practical restoration robustness, particularly under strong degradation, and demonstrates improved perceptual quality and identity preservation with real-world applicability.

Abstract

Restoring facial details from low-quality (LQ) images has remained a challenging problem due to its ill-posedness induced by various degradations in the wild. The existing codebook prior mitigates the ill-posedness by leveraging an autoencoder and learned codebook of high-quality (HQ) features, achieving remarkable quality. However, existing approaches in this paradigm frequently depend on a single encoder pre-trained on HQ data for restoring HQ images, disregarding the domain gap between LQ and HQ images. As a result, the encoding of LQ inputs may be insufficient, resulting in suboptimal performance. To tackle this problem, we propose a novel dual-branch framework named DAEFR. Our method introduces an auxiliary LQ branch that extracts crucial information from the LQ inputs. Additionally, we incorporate association training to promote effective synergy between the two branches, enhancing code prediction and output quality. We evaluate the effectiveness of DAEFR on both synthetic and real-world datasets, demonstrating its superior performance in restoring facial details. Project page: https://liagm.github.io/DAEFR/

Dual Associated Encoder for Face Restoration

TL;DR

Blind face restoration under severe, unknown degradations is compounded by a domain gap between HQ and LQ images. The proposed DAEFR framework introduces a dedicated auxiliary LQ encoder and a two-stage association-and-fusion strategy to align HQ and LQ representations and predict HQ code indices via a transformer, enabling high-quality restoration with preserved identity. Key contributions include the auxiliary LQ branch, patch-based association inspired by CLIP, and a multi-head cross-attention fusion that improves code-prediction accuracy and output fidelity on both synthetic and real-world datasets. This approach advances practical restoration robustness, particularly under strong degradation, and demonstrates improved perceptual quality and identity preservation with real-world applicability.

Abstract

Restoring facial details from low-quality (LQ) images has remained a challenging problem due to its ill-posedness induced by various degradations in the wild. The existing codebook prior mitigates the ill-posedness by leveraging an autoencoder and learned codebook of high-quality (HQ) features, achieving remarkable quality. However, existing approaches in this paradigm frequently depend on a single encoder pre-trained on HQ data for restoring HQ images, disregarding the domain gap between LQ and HQ images. As a result, the encoding of LQ inputs may be insufficient, resulting in suboptimal performance. To tackle this problem, we propose a novel dual-branch framework named DAEFR. Our method introduces an auxiliary LQ branch that extracts crucial information from the LQ inputs. Additionally, we incorporate association training to promote effective synergy between the two branches, enhancing code prediction and output quality. We evaluate the effectiveness of DAEFR on both synthetic and real-world datasets, demonstrating its superior performance in restoring facial details. Project page: https://liagm.github.io/DAEFR/
Paper Structure (37 sections, 10 equations, 20 figures, 6 tables)

This paper contains 37 sections, 10 equations, 20 figures, 6 tables.

Figures (20)

  • Figure 1: Comparison to existing framework. (a) Existing codebook prior approaches learn an encoder in the first stage. During the restoration stage, these approaches utilize LQ images to fine-tune the encoder using pre-trained weights obtained from HQ images. However, this approach introduces a domain bias due to a domain gap and overlooks the distinct feature representations between the encoder and LQ input images. (b) In the codebook learning stage, we propose the integration of an auxiliary branch specifically designed for encoding LQ information. This auxiliary branch is trained exclusively using LQ data to address domain bias and obtain precise feature representation. Furthermore, we introduce an association stage and feature fusion module to enhance the integration of information from both encoders and assist our restoration pipeline.
  • Figure 2: Proposed DAEFR framework. (a) Initially, we train the autoencoder and discrete codebook for both HQ and LQ image domains through self-reconstruction. (b) Once we obtain both encoders ($E_{H}$ and $E_{L}$), we divide the feature ($Z_{h}$ and $Z_{l}$) into patches ($P^{H}_{i}$ and $P^{L}_{i}$) and construct a similarity matrix $M_\text{assoc}$ that associates HQ and LQ features while incorporating spatial information. To promote maximum similarity between patch features, we employ a cross-entropy loss function to maximize the diagonal of the matrix. (c) After obtaining the associated encoders ($E^{A}_{H}$ and $E^{A}_{L}$), we use a multi-head cross-attention module (MHCA) to merge the features ($Z^{A}_{h}$ and $Z^{A}_{l}$) from the associated encoders, generating fused features $Z^{A}_{f}$. We then input the fused feature $Z^{A}_{f}$ to the transformer $\mathbf{T}$, which predicts the corresponding code index $\mathbf{s}$ for the HQ codebook $\mathbb{C}_{h}$. Finally, we use the predicted code index to retrieve the features and feed them to the HQ decoder $D_{H}$ to restore the image.
  • Figure 3: Qualitative comparison on real-world datasets. The BRIAR-Test dataset contains original identity clean images, allowing us to ascertain that the individual in this image is not wearing glasses. Our DAEFR method exhibits robustness in restoring high-quality faces even under heavy degradation.
  • Figure 4: Qualitative comparison on the synthetic CelebA-Test dataset. Our DAEFR method exhibits robustness in restoring high-quality faces even under heavy degradation.
  • Figure 5: Ablation studies. The experimental index in accordance with the Table \ref{['tab:ablation']} configuration is utilized. Our method successfully produces intricate facial details and closely resembles the ground truth, even when the input undergoes severe degradation. Importantly, we effectively retain the identity information from the degraded input.
  • ...and 15 more figures