Table of Contents
Fetching ...

Measuring Machine Learning Harms from Stereotypes Requires Understanding Who Is Harmed by Which Errors in What Ways

Angelina Wang, Xuechunzi Bai, Solon Barocas, Su Lin Blodgett

TL;DR

This work links social psychology of stereotypes to ML fairness by empirically assessing how different error types in image-search object recognition produce harms. It combines broad human annotation of stereotypes with four experiments that separate pragmatic harms (beliefs/behaviors) from experiential harms (negative affect) and compare stereotype-reinforcing, -violating, and neutral errors. The key finding is that stereotype-reinforcing errors yield experiential harm, especially for women, while pragmatic harms are not strongly detected in short-term lab settings; stereotype-violating errors can also cause harm, notably for men with wearable items. The results argue for a nuanced fairness approach that accounts for who is harmed by which errors and why, challenging one-size-fits-all mitigation and suggesting context- and group-sensitive cost frameworks.

Abstract

As machine learning applications proliferate, we need an understanding of their potential for harm. However, current fairness metrics are rarely grounded in human psychological experiences of harm. Drawing on the social psychology of stereotypes, we use a case study of gender stereotypes in image search to examine how people react to machine learning errors. First, we use survey studies to show that not all machine learning errors reflect stereotypes nor are equally harmful. Then, in experimental studies we randomly expose participants to stereotype-reinforcing, -violating, and -neutral machine learning errors. We find stereotype-reinforcing errors induce more experientially (i.e., subjectively) harmful experiences, while having minimal changes to cognitive beliefs, attitudes, or behaviors. This experiential harm impacts women more than men. However, certain stereotype-violating errors are more experientially harmful for men, potentially due to perceived threats to masculinity. We conclude that harm cannot be the sole guide in fairness mitigation, and propose a nuanced perspective depending on who is experiencing what harm and why.

Measuring Machine Learning Harms from Stereotypes Requires Understanding Who Is Harmed by Which Errors in What Ways

TL;DR

This work links social psychology of stereotypes to ML fairness by empirically assessing how different error types in image-search object recognition produce harms. It combines broad human annotation of stereotypes with four experiments that separate pragmatic harms (beliefs/behaviors) from experiential harms (negative affect) and compare stereotype-reinforcing, -violating, and neutral errors. The key finding is that stereotype-reinforcing errors yield experiential harm, especially for women, while pragmatic harms are not strongly detected in short-term lab settings; stereotype-violating errors can also cause harm, notably for men with wearable items. The results argue for a nuanced fairness approach that accounts for who is harmed by which errors and why, challenging one-size-fits-all mitigation and suggesting context- and group-sensitive cost frameworks.

Abstract

As machine learning applications proliferate, we need an understanding of their potential for harm. However, current fairness metrics are rarely grounded in human psychological experiences of harm. Drawing on the social psychology of stereotypes, we use a case study of gender stereotypes in image search to examine how people react to machine learning errors. First, we use survey studies to show that not all machine learning errors reflect stereotypes nor are equally harmful. Then, in experimental studies we randomly expose participants to stereotype-reinforcing, -violating, and -neutral machine learning errors. We find stereotype-reinforcing errors induce more experientially (i.e., subjectively) harmful experiences, while having minimal changes to cognitive beliefs, attitudes, or behaviors. This experiential harm impacts women more than men. However, certain stereotype-violating errors are more experientially harmful for men, potentially due to perceived threats to masculinity. We conclude that harm cannot be the sole guide in fairness mitigation, and propose a nuanced perspective depending on who is experiencing what harm and why.
Paper Structure (34 sections, 7 figures, 3 tables)

This paper contains 34 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Summary of our studies. The left side represents studies 1 and 2, where we ask human participants to mark which of the relevant objects in our application are stereotypically associated with which gender groups, as well as to qualitatively explain why that is and why it is harmful or not. The right side represents studies 3 and 4 where we randomly expose participants to machine learning errors which are stereotype-reinforcing, stereotype-violating, or stereotype-neutral---as determined by the annotations from study 1. Then, we measure two forms of harm we introduce: pragmatic (measurable changes in someone's cognitive beliefs, attitudes, or behaviors toward the group being stereotyped) and experiential (self-reports of negative affect). The images shown are examples of misclassifications of oven where stereotype-reinforcing errors are when it is falsely predicted on a woman, stereotype-violating when on a man, and stereotype-neutral when no gender stereotypes are invoked. These images are shown to participants on a search result page of oven.
  • Figure 2: COCO and Open Images object recognition datasets. We use two commonly used image recognition datasets to represent the application of a photo search engine. Both datasets contain annotations for perceived binary gender expression of the people in the images and the objects present in each image. The left panel shows one example figure from COCO annotated with surfboard. The right panel shows one example figure from Open Images annotated with objects such as tree and dog.
  • Figure 3: Study 1 Object Results. Detailed participant responses for each of the 80 objects in COCO dataset. Fraction indicates number of participants asked about each object who marked it as stereotypically related to the gender group of women or men.
  • Figure 4: Study 3, 4 Results. The effect sizes and 95% confidence intervals are reported for 10 of our 11 measures of pragmatic harm (for the behavior measure of captioning, we provide a descriptive analysis), experiential harm on COCO, and experiential harm on our larger dataset of OpenImages. Deviations from zero indicate that exposure to the stereotype-reinforcing stimulus resulted in our measured harm compared to exposure to the control condition.
  • Figure 5: Study 3 Stimuli. Our three different stimuli are shown for the conditions: stereotype-reinforcing, stereotype-violating, and neutral. They are all image search results containing minimal changes from each other, each of which indicates whether the search query is pictured in the image, i.e., if the image search retrieval was correct or not. The teal and orange squares indicate that the only difference between the stimuli, as all images which contain an oven also contain a bowl, and all which do not contain an oven also do not contain a bowl. This was a deliberate choice to control for all potential confounding factors from the images in the study.
  • ...and 2 more figures