Accuracy and Fairness of Facial Recognition Technology in Low-Quality Police Images: An Experiment With Synthetic Faces
Maria Cuellar, Hon Kiu, To, Arush Mehrotra
TL;DR
This study investigates how five common image degradation factors (contrast, brightness, motion blur, pose, and resolution) affect facial recognition accuracy and fairness across race and gender using synthetic StyleGAN3 faces labeled by FairFace. By evaluating a DeepFace/ArcFace pipeline on 1:$N$ identification tasks with controlled degradation, the authors demonstrate that false negatives rise with degradation while false positives peak near high-quality imagery, with pronounced disparities disadvantaging women and Black individuals, particularly Black females. Despite fairness concerns, FRT performance remains substantially higher than many traditional forensic methods under challenging conditions, underscoring both potential utility and the need for validation and regulation. The paper emphasizes that algorithmic accuracy alone is insufficient, and stresses transparency, proper use, and oversight of FRT deployment to ensure fair and legally robust outcomes.
Abstract
Facial recognition technology (FRT) is increasingly used in criminal investigations, yet most evaluations of its accuracy rely on high-quality images, unlike those often encountered by law enforcement. This study examines how five common forms of image degradation--contrast, brightness, motion blur, pose shift, and resolution--affect FRT accuracy and fairness across demographic groups. Using synthetic faces generated by StyleGAN3 and labeled with FairFace, we simulate degraded images and evaluate performance using Deepface with ArcFace loss in 1:n identification tasks. We perform an experiment and find that false positive rates peak near baseline image quality, while false negatives increase as degradation intensifies--especially with blur and low resolution. Error rates are consistently higher for women and Black individuals, with Black females most affected. These disparities raise concerns about fairness and reliability when FRT is used in real-world investigative contexts. Nevertheless, even under the most challenging conditions and for the most affected subgroups, FRT accuracy remains substantially higher than that of many traditional forensic methods. This suggests that, if appropriately validated and regulated, FRT should be considered a valuable investigative tool. However, algorithmic accuracy alone is not sufficient: we must also evaluate how FRT is used in practice, including user-driven data manipulation. Such cases underscore the need for transparency and oversight in FRT deployment to ensure both fairness and forensic validity.
