Investigating Labeler Bias in Face Annotation for Machine Learning

Luke Haliburton; Sinksar Ghebremedhin; Robin Welsch; Albrecht Schmidt; Sven Mayer

Investigating Labeler Bias in Face Annotation for Machine Learning

Luke Haliburton, Sinksar Ghebremedhin, Robin Welsch, Albrecht Schmidt, Sven Mayer

TL;DR

A study to investigate and measure the existence of labeler bias using images of people from different ethnicities and sexes in a labeling task and shows that participants possess stereotypes that influence their decision-making process and that labeler demographics impact assigned labels.

Abstract

In a world increasingly reliant on artificial intelligence, it is more important than ever to consider the ethical implications of artificial intelligence on humanity. One key under-explored challenge is labeler bias, which can create inherently biased datasets for training and subsequently lead to inaccurate or unfair decisions in healthcare, employment, education, and law enforcement. Hence, we conducted a study to investigate and measure the existence of labeler bias using images of people from different ethnicities and sexes in a labeling task. Our results show that participants possess stereotypes that influence their decision-making process and that labeler demographics impact assigned labels. We also discuss how labeler bias influences datasets and, subsequently, the models trained on them. Overall, a high degree of transparency must be maintained throughout the entire artificial intelligence training process to identify and correct biases in the data as early as possible.

Investigating Labeler Bias in Face Annotation for Machine Learning

TL;DR

Abstract

Paper Structure (18 sections, 5 figures, 2 tables)

This paper contains 18 sections, 5 figures, 2 tables.

Introduction
Related Work
Bias in Machine Learning
Historical and Sampling Bias
Labeler Bias
Stereotype Content Model
Method
Dataset Preprocessing and Portrait Selection
Participants
Study Procedure
Results
The Impact of Stereotypes on Estimations (RQ1)
The Impact of Demographics on Estimations (RQ2)
Discussion
Labelers Exhibit Bias
...and 3 more sections

Figures (5)

Figure 1: We implemented a script to exclude non-frontal faces and images containing more than one person.
Figure 2: The average participant income by ethnicity and age. A Pearson correlation showed that income variation had no significant impact on the results. Amounts are shown in GBP (£) as this is the currency used by Prolific.
Figure 3: Warmth-Competence ratings displayed from the perspective of the portraits and the labelers, including a 95% confidence interval. All ratings are clustered near neutral (3) for both warmth and competence.
Figure 4: Correlation between mean status and income. Each subplot represents a portrait ethnicity and the points in each plot show how labelers of each ethnicity rated the portraits.
Figure 5: Estimated income as a function of Labeler$_{Ethnicity}$ and Portrait$_{Ethnicity}$. Grey borders indicate the cases where Labeler$_{Ethnicity}$ and Portrait$_{Ethnicity}$ match.

Investigating Labeler Bias in Face Annotation for Machine Learning

TL;DR

Abstract

Investigating Labeler Bias in Face Annotation for Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)