Table of Contents
Fetching ...

Mask-up: Investigating Biases in Face Re-identification for Masked Faces

Siddharth D Jaiswal, Ankit Kr. Verma, Animesh Mukherjee

TL;DR

The study addresses biases in face re-identification under mask occlusion by conducting a large-scale adversarial audit across 13 FRS platforms (4 commercial, 9 open-source) over five diverse datasets, with masks generated via MaskTheFace. It also includes a two-part human-subject survey to compare human performance against automated systems and uses Grad-CAM to explain open-source model behavior under occlusion. Key findings reveal substantial race- and gender-based disparities, mask-type effects on accuracy, and partial robustness in some open-source models, while human performance remains biased and not reliably scalable as a fix. The work argues for ongoing, domain-specific audits and cautious deployment policies, highlighting that human-in-the-loop approaches alone are insufficient to ensure equitable, responsible use of masked-face re-identification technologies.

Abstract

AI based Face Recognition Systems (FRSs) are now widely distributed and deployed as MLaaS solutions all over the world, moreso since the COVID-19 pandemic for tasks ranging from validating individuals' faces while buying SIM cards to surveillance of citizens. Extensive biases have been reported against marginalized groups in these systems and have led to highly discriminatory outcomes. The post-pandemic world has normalized wearing face masks but FRSs have not kept up with the changing times. As a result, these systems are susceptible to mask based face occlusion. In this study, we audit four commercial and nine open-source FRSs for the task of face re-identification between different varieties of masked and unmasked images across five benchmark datasets (total 14,722 images). These simulate a realistic validation/surveillance task as deployed in all major countries around the world. Three of the commercial and five of the open-source FRSs are highly inaccurate; they further perpetuate biases against non-White individuals, with the lowest accuracy being 0%. A survey for the same task with 85 human participants also results in a low accuracy of 40%. Thus a human-in-the-loop moderation in the pipeline does not alleviate the concerns, as has been frequently hypothesized in literature. Our large-scale study shows that developers, lawmakers and users of such services need to rethink the design principles behind FRSs, especially for the task of face re-identification, taking cognizance of observed biases.

Mask-up: Investigating Biases in Face Re-identification for Masked Faces

TL;DR

The study addresses biases in face re-identification under mask occlusion by conducting a large-scale adversarial audit across 13 FRS platforms (4 commercial, 9 open-source) over five diverse datasets, with masks generated via MaskTheFace. It also includes a two-part human-subject survey to compare human performance against automated systems and uses Grad-CAM to explain open-source model behavior under occlusion. Key findings reveal substantial race- and gender-based disparities, mask-type effects on accuracy, and partial robustness in some open-source models, while human performance remains biased and not reliably scalable as a fix. The work argues for ongoing, domain-specific audits and cautious deployment policies, highlighting that human-in-the-loop approaches alone are insufficient to ensure equitable, responsible use of masked-face re-identification technologies.

Abstract

AI based Face Recognition Systems (FRSs) are now widely distributed and deployed as MLaaS solutions all over the world, moreso since the COVID-19 pandemic for tasks ranging from validating individuals' faces while buying SIM cards to surveillance of citizens. Extensive biases have been reported against marginalized groups in these systems and have led to highly discriminatory outcomes. The post-pandemic world has normalized wearing face masks but FRSs have not kept up with the changing times. As a result, these systems are susceptible to mask based face occlusion. In this study, we audit four commercial and nine open-source FRSs for the task of face re-identification between different varieties of masked and unmasked images across five benchmark datasets (total 14,722 images). These simulate a realistic validation/surveillance task as deployed in all major countries around the world. Three of the commercial and five of the open-source FRSs are highly inaccurate; they further perpetuate biases against non-White individuals, with the lowest accuracy being 0%. A survey for the same task with 85 human participants also results in a low accuracy of 40%. Thus a human-in-the-loop moderation in the pipeline does not alleviate the concerns, as has been frequently hypothesized in literature. Our large-scale study shows that developers, lawmakers and users of such services need to rethink the design principles behind FRSs, especially for the task of face re-identification, taking cognizance of observed biases.
Paper Structure (14 sections, 1 equation, 3 figures, 5 tables)

This paper contains 14 sections, 1 equation, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Images from CelebSET (row 1), CFD-USA (row 2) and Fairface (row 3) datasets in their original format (leftmost column) and with the surgical, N-95 and cloth (multiple colors) masks and, the Monk Skin Tone scale (last row).
  • Figure 2: Screenshot of the experiment with no deadline scenario on our survey website. Each webpage has one set of images, wherein the participant must answer for all images using a 5-point Likert-like scale on their likeness to the masked image. The experiment with deadline, in addition, has a timer on the top of the page that counts down from 2 minutes.
  • Figure 3: Grad-CAM activation maps of CelebSET images for the task of 1-to-N re-identification on the VGG-Face model with surgical and N-95 masked inputs. The first set of images on the left are for correct re-identification and the images on the right are for incorrect re-identification. In every triplet, we can see that the mask shifts the region of interest, leading to incorrect re-identification for some images.

Theorems & Definitions (2)

  • Definition 1
  • Definition 2