Table of Contents
Fetching ...

Diversity in Faces

Michele Merler, Nalini Ratha, Rogerio S. Feris, John R. Smith

TL;DR

Diversity in Faces (DiF) introduces a large, annotated face dataset designed to quantify intrinsic facial diversity across ten coding schemes, derived from $1{,}000{,}000$ images sampled from the YFCC-100M collection. The work implements craniofacial measures, symmetry, contrast, skin color via ITA, and predicted attributes (age, gender), plus subjective annotations and pose/resolution, enabling a multi-modal analysis of diversity using Shannon $H$ and Simpson $D/E$ metrics. Key findings show high diversity in craniofacial features and facial regions contrast, while pose is comparatively limited due to sampling; age and gender signals exhibit more uneven distributions, highlighting fairness concerns in current datasets. The paper proposes a practical, extendable framework for assessing and improving data coverage and balance to foster fairer and more accurate face-recognition systems, with future directions including cross-dataset comparisons and synthetic data generation to fill observed gaps.

Abstract

Face recognition is a long standing challenge in the field of Artificial Intelligence (AI). The goal is to create systems that accurately detect, recognize, verify, and understand human faces. There are significant technical hurdles in making these systems accurate, particularly in unconstrained settings due to confounding factors related to pose, resolution, illumination, occlusion, and viewpoint. However, with recent advances in neural networks, face recognition has achieved unprecedented accuracy, largely built on data-driven deep learning methods. While this is encouraging, a critical aspect that is limiting facial recognition accuracy and fairness is inherent facial diversity. Every face is different. Every face reflects something unique about us. Aspects of our heritage - including race, ethnicity, culture, geography - and our individual identify - age, gender, and other visible manifestations of self-expression, are reflected in our faces. We expect face recognition to work equally accurately for every face. Face recognition needs to be fair. As we rely on data-driven methods to create face recognition technology, we need to ensure necessary balance and coverage in training data. However, there are still scientific questions about how to represent and extract pertinent facial features and quantitatively measure facial diversity. Towards this goal, Diversity in Faces (DiF) provides a data set of one million annotated human face images for advancing the study of facial diversity. The annotations are generated using ten well-established facial coding schemes from the scientific literature. The facial coding schemes provide human-interpretable quantitative measures of facial features. We believe that by making the extracted coding schemes available on a large set of faces, we can accelerate research and development towards creating more fair and accurate facial recognition systems.

Diversity in Faces

TL;DR

Diversity in Faces (DiF) introduces a large, annotated face dataset designed to quantify intrinsic facial diversity across ten coding schemes, derived from images sampled from the YFCC-100M collection. The work implements craniofacial measures, symmetry, contrast, skin color via ITA, and predicted attributes (age, gender), plus subjective annotations and pose/resolution, enabling a multi-modal analysis of diversity using Shannon and Simpson metrics. Key findings show high diversity in craniofacial features and facial regions contrast, while pose is comparatively limited due to sampling; age and gender signals exhibit more uneven distributions, highlighting fairness concerns in current datasets. The paper proposes a practical, extendable framework for assessing and improving data coverage and balance to foster fairer and more accurate face-recognition systems, with future directions including cross-dataset comparisons and synthetic data generation to fill observed gaps.

Abstract

Face recognition is a long standing challenge in the field of Artificial Intelligence (AI). The goal is to create systems that accurately detect, recognize, verify, and understand human faces. There are significant technical hurdles in making these systems accurate, particularly in unconstrained settings due to confounding factors related to pose, resolution, illumination, occlusion, and viewpoint. However, with recent advances in neural networks, face recognition has achieved unprecedented accuracy, largely built on data-driven deep learning methods. While this is encouraging, a critical aspect that is limiting facial recognition accuracy and fairness is inherent facial diversity. Every face is different. Every face reflects something unique about us. Aspects of our heritage - including race, ethnicity, culture, geography - and our individual identify - age, gender, and other visible manifestations of self-expression, are reflected in our faces. We expect face recognition to work equally accurately for every face. Face recognition needs to be fair. As we rely on data-driven methods to create face recognition technology, we need to ensure necessary balance and coverage in training data. However, there are still scientific questions about how to represent and extract pertinent facial features and quantitatively measure facial diversity. Towards this goal, Diversity in Faces (DiF) provides a data set of one million annotated human face images for advancing the study of facial diversity. The annotations are generated using ten well-established facial coding schemes from the scientific literature. The facial coding schemes provide human-interpretable quantitative measures of facial features. We believe that by making the extracted coding schemes available on a large set of faces, we can accelerate research and development towards creating more fair and accurate facial recognition systems.

Paper Structure

This paper contains 31 sections, 15 figures, 12 tables.

Figures (15)

  • Figure 1: Each candidate photo from YFCC-100M was processed by first detecting the depicted faces with a Convolutional Neural Network (CNN) using the Faster-RCNN based object detector fasterrcnn_NIPS15. Then each detected face as in (a) was processed using DLIB DLIB09 to extract pose and landmark points as shown in (b) and subsequently assessed based on the width and height of the face region. Faces with region size less than 50x50 or inter-ocular distance of less than $30$ pixels were discarded. Faces with non-frontal pose, or anything beyond being slightly tilted to the left or the right, were also discarded. Finally, an affine transformation was performed using center points of both eyes, and the face was rectified as shown in (c).
  • Figure 2: We used the $68$ key-points extracted using DLIB from each face (small dots) to localize $19$ facial landmarks (large dots, labeled), out of the 47 introduced in anthropometry_book94. Those 19 landmarks were employed as the basis for extraction of the craniofacial measures for coding schemes 1--3.
  • Figure 3: Process for extracting facial symmetry measures for coding scheme 4, starting with (a) rectified face showing face mid-line and reference points for inner canthus (C1 and C2) and philtrum (C3) and line segmented connecting them (point $a$ for C1-C2 and point $b$ connecting C3 to the midpoint of point $a$). Additionally, a Sobel filter is used to extract (b) edge magnitude and (c) orientation to derive the measure for edge orientation similarity.
  • Figure 4: Process for extracting facial regions contrast measures for coding scheme 5. The computation is based on the average pixel intensity differences between the outer and inner regions for the lips, eyes and eyebrows as depicted above.
  • Figure 5: Process for extracting skin color for coding scheme 6 based on Individual Typology Angle-based (ITA). (a) Input face (b) skin map (c) $L$ channel (d) $a$ channel (e) $b$ channel (f) ITA map (g) masked ITA map (h) ITA histogram.
  • ...and 10 more figures