QGFace: Quality-Guided Joint Training For Mixed-Quality Face Recognition
Youzhe Song, Feng Wang
TL;DR
QGFace addresses mixed-quality face recognition by partitioning training data into HQ and LQ based on a feature-norm quality indicator, applying a classification loss to HQ samples and contrastive learning to LQ samples within a single encoder. The method introduces a proxy-updated real-time contrastive queue to supply robust positive and negative pairs, mitigating instability in joint training. Across HQ, LQ, and mixed-quality datasets (including SCface, Tinyface, and IJB-B), QGFace achieves strong performance, outperforming several baselines and quality-invariant approaches while maintaining end-to-end trainability. This approach offers a practical, scalable solution for real-world applications with diverse image qualities and can be readily combined with other techniques such as super-resolution or distillation for further improvements.
Abstract
The quality of a face crop in an image is decided by many factors such as camera resolution, distance, and illumination condition. This makes the discrimination of face images with different qualities a challenging problem in realistic applications. However, most existing approaches are designed specifically for high-quality (HQ) or low-quality (LQ) images, and the performances would degrade for the mixed-quality images. Besides, many methods ask for pre-trained feature extractors or other auxiliary structures to support the training and the evaluation. In this paper, we point out that the key to better understand both the HQ and the LQ images simultaneously is to apply different learning methods according to their qualities. We propose a novel quality-guided joint training approach for mixed-quality face recognition, which could simultaneously learn the images of different qualities with a single encoder. Based on quality partition, classification-based method is employed for HQ data learning. Meanwhile, for the LQ images which lack identity information, we learn them with self-supervised image-image contrastive learning. To effectively catch up the model update and improve the discriminability of contrastive learning in our joint training scenario, we further propose a proxy-updated real-time queue to compose the contrastive pairs with features from the genuine encoder. Experiments on the low-quality datasets SCface and Tinyface, the mixed-quality dataset IJB-B, and five high-quality datasets demonstrate the effectiveness of our proposed approach in recognizing face images of different qualities.
