DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer
Wei-Ting Chen, Gurunandan Krishnan, Qiang Gao, Sy-Yen Kuo, Sizhuo Ma, Jian Wang
TL;DR
This work tackles the challenge of robust generic face image quality assessment (GFIQA) by introducing a transformer-based framework that decouples content from degradation through Self-Supervised Dual-Set Degradation Representation Learning (DSL) and enhances perceptual sensitivity via a landmark-guided transformer. DSL learns global degradation representations by contrasting synthetic degradations on high-quality faces with real-world degradations, using a soft proximity mapping and a bidirectional contrastive loss to align cross-set degradations. A landmark-detection module and positional encoding focus the model on salient facial regions, improving regional confidence and overall MOS prediction. The authors also present CGFIQA-40k, a large, balanced dataset designed to reduce gender and skin-tone biases. Empirical results across GFIQA-20k, PIQ23, and CGFIQA-40k show that DSL-FIQA achieves superior correlation measures ($PLCC$ and $SRCC$) compared with strong baselines, underscoring the method’s robustness and practical value for real-world face image quality assessment.
Abstract
Generic Face Image Quality Assessment (GFIQA) evaluates the perceptual quality of facial images, which is crucial in improving image restoration algorithms and selecting high-quality face images for downstream tasks. We present a novel transformer-based method for GFIQA, which is aided by two unique mechanisms. First, a Dual-Set Degradation Representation Learning (DSL) mechanism uses facial images with both synthetic and real degradations to decouple degradation from content, ensuring generalizability to real-world scenarios. This self-supervised method learns degradation features on a global scale, providing a robust alternative to conventional methods that use local patch information in degradation learning. Second, our transformer leverages facial landmarks to emphasize visually salient parts of a face image in evaluating its perceptual quality. We also introduce a balanced and diverse Comprehensive Generic Face IQA (CGFIQA-40k) dataset of 40K images carefully designed to overcome the biases, in particular the imbalances in skin tone and gender representation, in existing datasets. Extensive analysis and evaluation demonstrate the robustness of our method, marking a significant improvement over prior methods.
