Can the accuracy bias by facial hairstyle be reduced through balancing the training data?
Kagan Ozturk, Haiyu Wu, Kevin W. Bowyer
TL;DR
This work investigates whether accuracy biases induced by facial hairstyles can be reduced by increasing training data size or by balancing hair distributions in training sets. Through controlled experiments using AdaFace on multiple WebFace-derived scales and a MORPH-based evaluation, the authors show that while overall recognition improves with more data, the accuracy gap between clean-shaven and facial-hair image pairs persists, and balancing training data does not elimina e this gap. They further test data augmentation that alters beard and mustache regions, observing some gains but no fundamental elimination of cross-hair bias, with effects differing across races. The findings imply that hairstyle-related fairness issues are not solved by data quantity or simple balancing, underscoring the importance of rigorous bias evaluation and more robust mitigation strategies in face recognition systems.
Abstract
Appearance of a face can be greatly altered by growing a beard and mustache. The facial hairstyles in a pair of images can cause marked changes to the impostor distribution and the genuine distribution. Also, different distributions of facial hairstyle across demographics could cause a false impression of relative accuracy across demographics. We first show that, even though larger training sets boost the recognition accuracy on all facial hairstyles, accuracy variations caused by facial hairstyles persist regardless of the size of the training set. Then, we analyze the impact of having different fractions of the training data represent facial hairstyles. We created balanced training sets using a set of identities available in Webface42M that both have clean-shaven and facial hair images. We find that, even when a face recognition model is trained with a balanced clean-shaven / facial hair training set, accuracy variation on the test data does not diminish. Next, data augmentation is employed to further investigate the effect of facial hair distribution in training data by manipulating facial hair pixels with the help of facial landmark points and a facial hair segmentation model. Our results show facial hair causes an accuracy gap between clean-shaven and facial hair images, and this impact can be significantly different between African-Americans and Caucasians.
