Table of Contents
Fetching ...

ENTED: Enhanced Neural Texture Extraction and Distribution for Reference-based Blind Face Restoration

Yuen-Fui Lau, Tianjia Zhang, Zhefan Rao, Qifeng Chen

TL;DR

ENTED tackles blind face restoration by utilizing a high-quality reference image to guide texture transfer while addressing degraded latent representations. It combines a neural texture extraction and distribution framework with a vector-quantized dictionary to replace corrupted latent features and a cross-attention–driven latent-space refinement that leverages the reference prior to produce HQ style codes, all while preserving identity through residual connections. The method demonstrates superior perceptual realism, texture fidelity, and identity preservation on synthetic and real-world datasets, with ablation studies confirming the benefit of each module. Overall, ENTED advances reference-based BFR by integrating HQ texture guidance, robust latent-space handling, and careful training losses to approach true portrait realism in diverse degradations.

Abstract

We present ENTED, a new framework for blind face restoration that aims to restore high-quality and realistic portrait images. Our method involves repairing a single degraded input image using a high-quality reference image. We utilize a texture extraction and distribution framework to transfer high-quality texture features between the degraded input and reference image. However, the StyleGAN-like architecture in our framework requires high-quality latent codes to generate realistic images. The latent code extracted from the degraded input image often contains corrupted features, making it difficult to align the semantic information from the input with the high-quality textures from the reference. To overcome this challenge, we employ two special techniques. The first technique, inspired by vector quantization, replaces corrupted semantic features with high-quality code words. The second technique generates style codes that carry photorealistic texture information from a more informative latent space developed using the high-quality features in the reference image's manifold. Extensive experiments conducted on synthetic and real-world datasets demonstrate that our method produces results with more realistic contextual details and outperforms state-of-the-art methods. A thorough ablation study confirms the effectiveness of each proposed module.

ENTED: Enhanced Neural Texture Extraction and Distribution for Reference-based Blind Face Restoration

TL;DR

ENTED tackles blind face restoration by utilizing a high-quality reference image to guide texture transfer while addressing degraded latent representations. It combines a neural texture extraction and distribution framework with a vector-quantized dictionary to replace corrupted latent features and a cross-attention–driven latent-space refinement that leverages the reference prior to produce HQ style codes, all while preserving identity through residual connections. The method demonstrates superior perceptual realism, texture fidelity, and identity preservation on synthetic and real-world datasets, with ablation studies confirming the benefit of each module. Overall, ENTED advances reference-based BFR by integrating HQ texture guidance, robust latent-space handling, and careful training losses to approach true portrait realism in diverse degradations.

Abstract

We present ENTED, a new framework for blind face restoration that aims to restore high-quality and realistic portrait images. Our method involves repairing a single degraded input image using a high-quality reference image. We utilize a texture extraction and distribution framework to transfer high-quality texture features between the degraded input and reference image. However, the StyleGAN-like architecture in our framework requires high-quality latent codes to generate realistic images. The latent code extracted from the degraded input image often contains corrupted features, making it difficult to align the semantic information from the input with the high-quality textures from the reference. To overcome this challenge, we employ two special techniques. The first technique, inspired by vector quantization, replaces corrupted semantic features with high-quality code words. The second technique generates style codes that carry photorealistic texture information from a more informative latent space developed using the high-quality features in the reference image's manifold. Extensive experiments conducted on synthetic and real-world datasets demonstrate that our method produces results with more realistic contextual details and outperforms state-of-the-art methods. A thorough ablation study confirms the effectiveness of each proposed module.
Paper Structure (20 sections, 12 equations, 4 figures, 3 tables)

This paper contains 20 sections, 12 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: When each face restoration is performed using the same reference image and experimental setup, the output without a residual connection tends to degrade the facial identity. However, when a residual connection is used, the result exhibits superior facial details that align more closely with the original image.
  • Figure 2: Visual comparison with the state-of-the-art blind face restoration methods. The first row demonstrates the visual comparisons of real-world data and the second row shows the visual comparisons of synthesized data.
  • Figure 3: A summary of our pipeline. Using reference features, we construct a high-quality image by transferring high-quality reference details (Neural Texture Extraction and Distribution Process) and repairing (Application of VQ Dictionary and Latent Space Refinement) distorted semantic information in degraded input images.
  • Figure 4: The first row demonstrates a 8x blind face restoration. Our results display fewer distortions and align more closely with the original images, particularly in terms of the color of the pupils. The second row shows a 4x blind face restoration. Our approach reveals a skin texture that is more detailed and refined compared to what is achieved by current state-of-the-art methods.