Table of Contents
Fetching ...

Impact of Blur and Resolution on Demographic Disparities in 1-to-Many Facial Identification

Aman Bhatta, Gabriella Pangelinan, Michael C. King, Kevin W. Bowyer

TL;DR

The study addresses how demographic disparities manifest in open-set 1-to-many facial identification and how probe quality (blur and resolution) typical of surveillance footage impacts performance. It advances by introducing two tail-based metrics in addition to the conventional $d'$ measure and by evaluating ArcFace, MagFace, and AdaFace on the MORPH dataset under both government-ID and surveillance-like conditions. Key findings show that 1-to-many disparity rankings differ from 1-to-1, sample balancing affects measured gaps, and blur or low resolution can dramatically increase false positive identifications, with gender differences often larger than race differences under degradation. The work highlights practical implications for quality-aware evaluation and threshold-setting in real-world surveillance deployments B and provides a framework for assessing probe quality effects in 1-to-many face identification.

Abstract

Most studies to date that have examined demographic variations in face recognition accuracy have analyzed 1-to-1 matching accuracy, using images that could be described as "government ID quality". This paper analyzes the accuracy of 1-to-many facial identification across demographic groups, and in the presence of blur and reduced resolution in the probe image as might occur in "surveillance camera quality" images. Cumulative match characteristic curves (CMC) are not appropriate for comparing propensity for rank-one recognition errors across demographics, and so we use three metrics for our analysis: (1) the well-known d' metric between mated and non-mated score distributions, and introduced in this work, (2) absolute score difference between thresholds in the high-similarity tail of the non-mated and the low-similarity tail of the mated distribution, and (3) distribution of (mated - non-mated rank-one scores) across the set of probe images. We find that demographic variation in 1-to-many accuracy does not entirely follow what has been observed in 1-to-1 matching accuracy. Also, different from 1-to-1 accuracy, demographic comparison of 1-to-many accuracy can be affected by different numbers of identities and images across demographics. More importantly, we show that increased blur in the probe image, or reduced resolution of the face in the probe image, can significantly increase the false positive identification rate. And we show that the demographic variation in these high blur or low resolution conditions is much larger for male / female than for African-American / Caucasian. The point that 1-to-many accuracy can potentially collapse in the context of processing "surveillance camera quality" probe images against a "government ID quality" gallery is an important one.

Impact of Blur and Resolution on Demographic Disparities in 1-to-Many Facial Identification

TL;DR

The study addresses how demographic disparities manifest in open-set 1-to-many facial identification and how probe quality (blur and resolution) typical of surveillance footage impacts performance. It advances by introducing two tail-based metrics in addition to the conventional measure and by evaluating ArcFace, MagFace, and AdaFace on the MORPH dataset under both government-ID and surveillance-like conditions. Key findings show that 1-to-many disparity rankings differ from 1-to-1, sample balancing affects measured gaps, and blur or low resolution can dramatically increase false positive identifications, with gender differences often larger than race differences under degradation. The work highlights practical implications for quality-aware evaluation and threshold-setting in real-world surveillance deployments B and provides a framework for assessing probe quality effects in 1-to-many face identification.

Abstract

Most studies to date that have examined demographic variations in face recognition accuracy have analyzed 1-to-1 matching accuracy, using images that could be described as "government ID quality". This paper analyzes the accuracy of 1-to-many facial identification across demographic groups, and in the presence of blur and reduced resolution in the probe image as might occur in "surveillance camera quality" images. Cumulative match characteristic curves (CMC) are not appropriate for comparing propensity for rank-one recognition errors across demographics, and so we use three metrics for our analysis: (1) the well-known d' metric between mated and non-mated score distributions, and introduced in this work, (2) absolute score difference between thresholds in the high-similarity tail of the non-mated and the low-similarity tail of the mated distribution, and (3) distribution of (mated - non-mated rank-one scores) across the set of probe images. We find that demographic variation in 1-to-many accuracy does not entirely follow what has been observed in 1-to-1 matching accuracy. Also, different from 1-to-1 accuracy, demographic comparison of 1-to-many accuracy can be affected by different numbers of identities and images across demographics. More importantly, we show that increased blur in the probe image, or reduced resolution of the face in the probe image, can significantly increase the false positive identification rate. And we show that the demographic variation in these high blur or low resolution conditions is much larger for male / female than for African-American / Caucasian. The point that 1-to-many accuracy can potentially collapse in the context of processing "surveillance camera quality" probe images against a "government ID quality" gallery is an important one.
Paper Structure (12 sections, 1 equation, 8 figures, 2 tables)

This paper contains 12 sections, 1 equation, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Does 1-to-many rank-one match error rate vary across demographics? In 1-to-many identification, an image of a person with unknown identity (the probe) is matched against a list of persons with known identity (the gallery) to find a candidate identity.
  • Figure 2: Baseline 1-to-1 (verification) and 1-to-many (identification) distributions. Top row shows 1-to-1 impostor and genuine distributions; d-prime is greatest for African-American Male and lowest for Caucasian Female. Bottom row shows 1-to-many mated and non-mated distributions for ArcFace (results for AdaFace and MagFace in Table \ref{['tab:d-prime']}) ; d-prime is greatest (lowest false positive identification rate) for African-American Male and lowest (highest false positive identification rate) for African-American Female. In the legend of figures in the top row, the labels "arc," "ada," and "mag" correspond to the results obtained using the ArcFace, AdaFace, and MagFace loss functions, respectively.
  • Figure 3: (mated - non-mated) distributions by demographic for three face matchers. The X-axis represents the "Relative Frequency", while the Y-axis represents the "Difference of mated and non-mated Rank1 match score".
  • Figure 4: Sample probe images with increasing Gaussian blur levels.
  • Figure 5: Impact of probe image blur on one-to-many matching accuracy. The top row shows the results for ArcFace, the middle row for AdaFace, and the bottom row for MagFace. The impact of blurred probe images remains consistent across the matchers. In all plots, the X-axis represents "Relative Frequency," while the Y-axis represents "Match Scores."
  • ...and 3 more figures