Table of Contents
Fetching ...

QGFace: Quality-Guided Joint Training For Mixed-Quality Face Recognition

Youzhe Song, Feng Wang

TL;DR

QGFace addresses mixed-quality face recognition by partitioning training data into HQ and LQ based on a feature-norm quality indicator, applying a classification loss to HQ samples and contrastive learning to LQ samples within a single encoder. The method introduces a proxy-updated real-time contrastive queue to supply robust positive and negative pairs, mitigating instability in joint training. Across HQ, LQ, and mixed-quality datasets (including SCface, Tinyface, and IJB-B), QGFace achieves strong performance, outperforming several baselines and quality-invariant approaches while maintaining end-to-end trainability. This approach offers a practical, scalable solution for real-world applications with diverse image qualities and can be readily combined with other techniques such as super-resolution or distillation for further improvements.

Abstract

The quality of a face crop in an image is decided by many factors such as camera resolution, distance, and illumination condition. This makes the discrimination of face images with different qualities a challenging problem in realistic applications. However, most existing approaches are designed specifically for high-quality (HQ) or low-quality (LQ) images, and the performances would degrade for the mixed-quality images. Besides, many methods ask for pre-trained feature extractors or other auxiliary structures to support the training and the evaluation. In this paper, we point out that the key to better understand both the HQ and the LQ images simultaneously is to apply different learning methods according to their qualities. We propose a novel quality-guided joint training approach for mixed-quality face recognition, which could simultaneously learn the images of different qualities with a single encoder. Based on quality partition, classification-based method is employed for HQ data learning. Meanwhile, for the LQ images which lack identity information, we learn them with self-supervised image-image contrastive learning. To effectively catch up the model update and improve the discriminability of contrastive learning in our joint training scenario, we further propose a proxy-updated real-time queue to compose the contrastive pairs with features from the genuine encoder. Experiments on the low-quality datasets SCface and Tinyface, the mixed-quality dataset IJB-B, and five high-quality datasets demonstrate the effectiveness of our proposed approach in recognizing face images of different qualities.

QGFace: Quality-Guided Joint Training For Mixed-Quality Face Recognition

TL;DR

QGFace addresses mixed-quality face recognition by partitioning training data into HQ and LQ based on a feature-norm quality indicator, applying a classification loss to HQ samples and contrastive learning to LQ samples within a single encoder. The method introduces a proxy-updated real-time contrastive queue to supply robust positive and negative pairs, mitigating instability in joint training. Across HQ, LQ, and mixed-quality datasets (including SCface, Tinyface, and IJB-B), QGFace achieves strong performance, outperforming several baselines and quality-invariant approaches while maintaining end-to-end trainability. This approach offers a practical, scalable solution for real-world applications with diverse image qualities and can be readily combined with other techniques such as super-resolution or distillation for further improvements.

Abstract

The quality of a face crop in an image is decided by many factors such as camera resolution, distance, and illumination condition. This makes the discrimination of face images with different qualities a challenging problem in realistic applications. However, most existing approaches are designed specifically for high-quality (HQ) or low-quality (LQ) images, and the performances would degrade for the mixed-quality images. Besides, many methods ask for pre-trained feature extractors or other auxiliary structures to support the training and the evaluation. In this paper, we point out that the key to better understand both the HQ and the LQ images simultaneously is to apply different learning methods according to their qualities. We propose a novel quality-guided joint training approach for mixed-quality face recognition, which could simultaneously learn the images of different qualities with a single encoder. Based on quality partition, classification-based method is employed for HQ data learning. Meanwhile, for the LQ images which lack identity information, we learn them with self-supervised image-image contrastive learning. To effectively catch up the model update and improve the discriminability of contrastive learning in our joint training scenario, we further propose a proxy-updated real-time queue to compose the contrastive pairs with features from the genuine encoder. Experiments on the low-quality datasets SCface and Tinyface, the mixed-quality dataset IJB-B, and five high-quality datasets demonstrate the effectiveness of our proposed approach in recognizing face images of different qualities.
Paper Structure (19 sections, 7 equations, 5 figures, 5 tables)

This paper contains 19 sections, 7 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Conventional pure classification method vs. our QGFace. (a) An FR training pipeline with a single classification loss. All input images are taken into a single main loss. (b) Our pipeline additionally takes contrastive loss for low-quality images. We apply data augmentation to images and split their features into two parts: HQ and LQ ones. The HQ images are supervised by the classification loss, whereas the LQ images and their related HQ images are sent to the contrastive function. The gradient flow of the HQ data is stopped in contrastive learning. A real-time queue is designed to provide an effective feature queue and support the large-scale feature comparison.
  • Figure 2: Illustration of contrastive learning for the LQ features. The distance between a data point and the origin denotes its feature norm. The GST shows that AdaFace puts a lot of attentions on the low-quality samples. The classification process matches the images with the abstract identity proxies, which is challenging when learning with the LQ data. We take the instance-level contrastive learning as shown in (a) on these images to relieve the learning burden.
  • Figure 3: (a) The updating process of our proposed proxy-updated real-time queue; (b) The difference between the positive and the negative pairs with different queues. The momentum queue faces a strongly limited boundary with the steady and whitened features. With our queue, contrastive learning can take the features from the training encoder to compose the positive pairs.
  • Figure 4: Examples of the image pairs. Every two columns show several pairs of original images and the augmented images. The pairs in red box contain LQ images while the pairs in green box are all HQ images. Our quality partitioning strategy is capable of distinguishing low-quality images which are blurred or with obstacles, or contain only a part of face.
  • Figure 5: Similarity difference between QGFace and the Baseline on SCface. The histograms illustrate the similarity between probes and their related gallery images (matched pairs), where our method shows obvious advantages. The title of each sub-figure indicates the sub-dataset and the improvement of the difference on the mean similarity between the matched pairs and the most similar unmatched pairs.