Measuring Machine Learning Harms from Stereotypes Requires Understanding Who Is Harmed by Which Errors in What Ways
Angelina Wang, Xuechunzi Bai, Solon Barocas, Su Lin Blodgett
TL;DR
This work links social psychology of stereotypes to ML fairness by empirically assessing how different error types in image-search object recognition produce harms. It combines broad human annotation of stereotypes with four experiments that separate pragmatic harms (beliefs/behaviors) from experiential harms (negative affect) and compare stereotype-reinforcing, -violating, and neutral errors. The key finding is that stereotype-reinforcing errors yield experiential harm, especially for women, while pragmatic harms are not strongly detected in short-term lab settings; stereotype-violating errors can also cause harm, notably for men with wearable items. The results argue for a nuanced fairness approach that accounts for who is harmed by which errors and why, challenging one-size-fits-all mitigation and suggesting context- and group-sensitive cost frameworks.
Abstract
As machine learning applications proliferate, we need an understanding of their potential for harm. However, current fairness metrics are rarely grounded in human psychological experiences of harm. Drawing on the social psychology of stereotypes, we use a case study of gender stereotypes in image search to examine how people react to machine learning errors. First, we use survey studies to show that not all machine learning errors reflect stereotypes nor are equally harmful. Then, in experimental studies we randomly expose participants to stereotype-reinforcing, -violating, and -neutral machine learning errors. We find stereotype-reinforcing errors induce more experientially (i.e., subjectively) harmful experiences, while having minimal changes to cognitive beliefs, attitudes, or behaviors. This experiential harm impacts women more than men. However, certain stereotype-violating errors are more experientially harmful for men, potentially due to perceived threats to masculinity. We conclude that harm cannot be the sole guide in fairness mitigation, and propose a nuanced perspective depending on who is experiencing what harm and why.
