Diversity in Faces

Michele Merler; Nalini Ratha; Rogerio S. Feris; John R. Smith

Diversity in Faces

Michele Merler, Nalini Ratha, Rogerio S. Feris, John R. Smith

TL;DR

Diversity in Faces (DiF) introduces a large, annotated face dataset designed to quantify intrinsic facial diversity across ten coding schemes, derived from $1{,}000{,}000$ images sampled from the YFCC-100M collection. The work implements craniofacial measures, symmetry, contrast, skin color via ITA, and predicted attributes (age, gender), plus subjective annotations and pose/resolution, enabling a multi-modal analysis of diversity using Shannon $H$ and Simpson $D/E$ metrics. Key findings show high diversity in craniofacial features and facial regions contrast, while pose is comparatively limited due to sampling; age and gender signals exhibit more uneven distributions, highlighting fairness concerns in current datasets. The paper proposes a practical, extendable framework for assessing and improving data coverage and balance to foster fairer and more accurate face-recognition systems, with future directions including cross-dataset comparisons and synthetic data generation to fill observed gaps.

Abstract

Face recognition is a long standing challenge in the field of Artificial Intelligence (AI). The goal is to create systems that accurately detect, recognize, verify, and understand human faces. There are significant technical hurdles in making these systems accurate, particularly in unconstrained settings due to confounding factors related to pose, resolution, illumination, occlusion, and viewpoint. However, with recent advances in neural networks, face recognition has achieved unprecedented accuracy, largely built on data-driven deep learning methods. While this is encouraging, a critical aspect that is limiting facial recognition accuracy and fairness is inherent facial diversity. Every face is different. Every face reflects something unique about us. Aspects of our heritage - including race, ethnicity, culture, geography - and our individual identify - age, gender, and other visible manifestations of self-expression, are reflected in our faces. We expect face recognition to work equally accurately for every face. Face recognition needs to be fair. As we rely on data-driven methods to create face recognition technology, we need to ensure necessary balance and coverage in training data. However, there are still scientific questions about how to represent and extract pertinent facial features and quantitatively measure facial diversity. Towards this goal, Diversity in Faces (DiF) provides a data set of one million annotated human face images for advancing the study of facial diversity. The annotations are generated using ten well-established facial coding schemes from the scientific literature. The facial coding schemes provide human-interpretable quantitative measures of facial features. We believe that by making the extracted coding schemes available on a large set of faces, we can accelerate research and development towards creating more fair and accurate facial recognition systems.

Diversity in Faces

TL;DR

Abstract

Diversity in Faces

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (15)