ENTED: Enhanced Neural Texture Extraction and Distribution for Reference-based Blind Face Restoration
Yuen-Fui Lau, Tianjia Zhang, Zhefan Rao, Qifeng Chen
TL;DR
ENTED tackles blind face restoration by utilizing a high-quality reference image to guide texture transfer while addressing degraded latent representations. It combines a neural texture extraction and distribution framework with a vector-quantized dictionary to replace corrupted latent features and a cross-attention–driven latent-space refinement that leverages the reference prior to produce HQ style codes, all while preserving identity through residual connections. The method demonstrates superior perceptual realism, texture fidelity, and identity preservation on synthetic and real-world datasets, with ablation studies confirming the benefit of each module. Overall, ENTED advances reference-based BFR by integrating HQ texture guidance, robust latent-space handling, and careful training losses to approach true portrait realism in diverse degradations.
Abstract
We present ENTED, a new framework for blind face restoration that aims to restore high-quality and realistic portrait images. Our method involves repairing a single degraded input image using a high-quality reference image. We utilize a texture extraction and distribution framework to transfer high-quality texture features between the degraded input and reference image. However, the StyleGAN-like architecture in our framework requires high-quality latent codes to generate realistic images. The latent code extracted from the degraded input image often contains corrupted features, making it difficult to align the semantic information from the input with the high-quality textures from the reference. To overcome this challenge, we employ two special techniques. The first technique, inspired by vector quantization, replaces corrupted semantic features with high-quality code words. The second technique generates style codes that carry photorealistic texture information from a more informative latent space developed using the high-quality features in the reference image's manifold. Extensive experiments conducted on synthetic and real-world datasets demonstrate that our method produces results with more realistic contextual details and outperforms state-of-the-art methods. A thorough ablation study confirms the effectiveness of each proposed module.
