Table of Contents
Fetching ...

Locally-Focused Face Representation for Sketch-to-Image Generation Using Noise-Induced Refinement

Muhammad Umer Ramzan, Ali Zia, Abdelwahed Khamis, yman Elgharabawy, Ahmad Liaqat, Usman Ali

TL;DR

The paper tackles turning simple face sketches into high-fidelity color images, a task with forensic and biometric relevance. It introduces a two-stage framework: first, a CA2N-based locally-focused representation learning stage that extracts five facial component descriptors with a block attention encoder, and second, a noise-induced domain-adaptive cGAN that maps these descriptors to spatial feature maps and generates realistic faces, followed by GFPGAN post-processing. The authors define a rich loss suite, including $L_{GAN}$, $L_{content}$, $L_{perc}$, and $L_{str}$, plus a noise-induced mechanism to improve generalization across unseen sketch domains. Across CelebAMask-HQ, CUHK, and CUFSF, the method achieves state-of-the-art results in FID, IS, KID, SSIM, and PSNR, while demonstrating robustness to different sketch styles; the approach holds promise for practical sketch-to-image synthesis in law-enforcement and related fields, with future work aimed at broader domain generalization.

Abstract

This paper presents a novel deep-learning framework that significantly enhances the transformation of rudimentary face sketches into high-fidelity colour images. Employing a Convolutional Block Attention-based Auto-encoder Network (CA2N), our approach effectively captures and enhances critical facial features through a block attention mechanism within an encoder-decoder architecture. Subsequently, the framework utilises a noise-induced conditional Generative Adversarial Network (cGAN) process that allows the system to maintain high performance even on domains unseen during the training. These enhancements lead to considerable improvements in image realism and fidelity, with our model achieving superior performance metrics that outperform the best method by FID margin of 17, 23, and 38 on CelebAMask-HQ, CUHK, and CUFSF datasets; respectively. The model sets a new state-of-the-art in sketch-to-image generation, can generalize across sketch types, and offers a robust solution for applications such as criminal identification in law enforcement.

Locally-Focused Face Representation for Sketch-to-Image Generation Using Noise-Induced Refinement

TL;DR

The paper tackles turning simple face sketches into high-fidelity color images, a task with forensic and biometric relevance. It introduces a two-stage framework: first, a CA2N-based locally-focused representation learning stage that extracts five facial component descriptors with a block attention encoder, and second, a noise-induced domain-adaptive cGAN that maps these descriptors to spatial feature maps and generates realistic faces, followed by GFPGAN post-processing. The authors define a rich loss suite, including , , , and , plus a noise-induced mechanism to improve generalization across unseen sketch domains. Across CelebAMask-HQ, CUHK, and CUFSF, the method achieves state-of-the-art results in FID, IS, KID, SSIM, and PSNR, while demonstrating robustness to different sketch styles; the approach holds promise for practical sketch-to-image synthesis in law-enforcement and related fields, with future work aimed at broader domain generalization.

Abstract

This paper presents a novel deep-learning framework that significantly enhances the transformation of rudimentary face sketches into high-fidelity colour images. Employing a Convolutional Block Attention-based Auto-encoder Network (CA2N), our approach effectively captures and enhances critical facial features through a block attention mechanism within an encoder-decoder architecture. Subsequently, the framework utilises a noise-induced conditional Generative Adversarial Network (cGAN) process that allows the system to maintain high performance even on domains unseen during the training. These enhancements lead to considerable improvements in image realism and fidelity, with our model achieving superior performance metrics that outperform the best method by FID margin of 17, 23, and 38 on CelebAMask-HQ, CUHK, and CUFSF datasets; respectively. The model sets a new state-of-the-art in sketch-to-image generation, can generalize across sketch types, and offers a robust solution for applications such as criminal identification in law enforcement.

Paper Structure

This paper contains 15 sections, 7 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Illustration of the proposed sketch to image generation architecture. (Top) Locally-Focused Face Representation Learning. The CBAM applies channel attention followed by spatial attention on each component of the sketch face to refine feature representations in the autoencoder. (Bottom) Noise-Induced Adversarial Face Generation . The feature descriptors of each facial component get converted into feature maps and these feature maps undergo the training process using enhanced cGAN. Finally, the generated image passed through a pre-trained GFPGAN wang2021towards to enhance the quality of the generated image.
  • Figure 2: A qualitative comparison of results between our approach and other methods on CelebAMask-HQ dataset.