Table of Contents
Fetching ...

IFQA: Interpretable Face Quality Assessment

Byungho Jo, Donghyeon Cho, In Kyu Park, Sungeun Hong

TL;DR

IFQA tackles the mismatch between general IQA metrics and human perception in faces by introducing an interpretable, face-centric metric built on an adversarial framework. A plain restoration generator and a per-pixel, U-Net–based discriminator—with facial primary region emphasis via FPRS and facial masks—produce pixel-level quality maps and an image-level score by averaging per-pixel outputs. The approach outperforms traditional NR-IQA and FIQA baselines in correlating with human judgments across multiple datasets and restoration models, and its pixel-level outputs enable interpretable analysis of facial regions driving quality assessments. Additionally, IFQA demonstrates generalization across architectures and datasets and can serve as an objective function to improve face-generation tasks, offering a scalable, interpretable alternative to costly human studies for face restoration evaluation.

Abstract

Existing face restoration models have relied on general assessment metrics that do not consider the characteristics of facial regions. Recent works have therefore assessed their methods using human studies, which is not scalable and involves significant effort. This paper proposes a novel face-centric metric based on an adversarial framework where a generator simulates face restoration and a discriminator assesses image quality. Specifically, our per-pixel discriminator enables interpretable evaluation that cannot be provided by traditional metrics. Moreover, our metric emphasizes facial primary regions considering that even minor changes to the eyes, nose, and mouth significantly affect human cognition. Our face-oriented metric consistently surpasses existing general or facial image quality assessment metrics by impressive margins. We demonstrate the generalizability of the proposed strategy in various architectural designs and challenging scenarios. Interestingly, we find that our IFQA can lead to performance improvement as an objective function.

IFQA: Interpretable Face Quality Assessment

TL;DR

IFQA tackles the mismatch between general IQA metrics and human perception in faces by introducing an interpretable, face-centric metric built on an adversarial framework. A plain restoration generator and a per-pixel, U-Net–based discriminator—with facial primary region emphasis via FPRS and facial masks—produce pixel-level quality maps and an image-level score by averaging per-pixel outputs. The approach outperforms traditional NR-IQA and FIQA baselines in correlating with human judgments across multiple datasets and restoration models, and its pixel-level outputs enable interpretable analysis of facial regions driving quality assessments. Additionally, IFQA demonstrates generalization across architectures and datasets and can serve as an objective function to improve face-generation tasks, offering a scalable, interpretable alternative to costly human studies for face restoration evaluation.

Abstract

Existing face restoration models have relied on general assessment metrics that do not consider the characteristics of facial regions. Recent works have therefore assessed their methods using human studies, which is not scalable and involves significant effort. This paper proposes a novel face-centric metric based on an adversarial framework where a generator simulates face restoration and a discriminator assesses image quality. Specifically, our per-pixel discriminator enables interpretable evaluation that cannot be provided by traditional metrics. Moreover, our metric emphasizes facial primary regions considering that even minor changes to the eyes, nose, and mouth significantly affect human cognition. Our face-oriented metric consistently surpasses existing general or facial image quality assessment metrics by impressive margins. We demonstrate the generalizability of the proposed strategy in various architectural designs and challenging scenarios. Interestingly, we find that our IFQA can lead to performance improvement as an objective function.
Paper Structure (23 sections, 9 equations, 13 figures, 8 tables)

This paper contains 23 sections, 9 equations, 13 figures, 8 tables.

Figures (13)

  • Figure 1: Which of 'Image A' or 'Image B' is closer to the given reference image or looks high-quality? General full-reference metrics (e.g. PSNR/SSIM), no-reference metrics (e.g. NIQE, BRISQUE, PI), and FIQA methods are inconsistent with human judgment. LPIPS agrees with human judgment but cannot be applied to the blind face restoration scenario. Our IFQA is consistent with human judgment and can provide interpretability maps where the brighter the area, the higher the quality.
  • Figure 2: Comparison of PSNR/SSIM and human assessment on restored face images. PSNR/SSIM provides higher scores to 'Image A' than 'Image B' while human subjects vote 'Image B' as higher quality face images than 'Image A'.
  • Figure 3: IFQA framework outline. Given HQ images, we obtain LQ images via BFR formulation. The generator (G) mimics face restoration models, while the discriminator (D) is used to evaluate image quality by determining high-quality regions as 'real’ and low-quality or restored regions as 'fake'. Through its U-Net architecture, the discriminator is able to evaluate the image pixel-by-pixel. FPRS allows the proposed metric to give more weight to facial primary regions that have a significant impact on human visual perception.
  • Figure 4: Supervision for IFQA metric. Regions from high-quality images provide 'real' labels (yellow), while regions from low-quality or restored face images give 'fake' labels (purple). The red box indicates the randomly selected swapped region.
  • Figure 5: Box plot of restoration models through human study.
  • ...and 8 more figures