Racial/Ethnic Categories in AI and Algorithmic Fairness: Why They Matter and What They Represent
Jennifer Mickel
TL;DR
The paper tackles the problem that racial/ethnic categories in AI fairness lack justification and fail to document the racialization process. It introduces CIRCSheets, a structured framework to record categories, racialization, cultural context, multiracial identities, and researcher positionality. Through a case study on COMPAS, the authors demonstrate how undocumented category choices can obscure important subgroups and bias assessments. The work advances transparency, enabling more reliable auditing and contextual deployment of fairness tools.
Abstract
Racial diversity has become increasingly discussed within the AI and algorithmic fairness literature, yet little attention is focused on justifying the choices of racial categories and understanding how people are racialized into these chosen racial categories. Even less attention is given to how racial categories shift and how the racialization process changes depending on the context of a dataset or model. An unclear understanding of \textit{who} comprises the racial categories chosen and \textit{how} people are racialized into these categories can lead to varying interpretations of these categories. These varying interpretations can lead to harm when the understanding of racial categories and the racialization process is misaligned from the actual racialization process and racial categories used. Harm can also arise if the racialization process and racial categories used are irrelevant or do not exist in the context they are applied. In this paper, we make two contributions. First, we demonstrate how racial categories with unclear assumptions and little justification can lead to varying datasets that poorly represent groups obfuscated or unrepresented by the given racial categories and models that perform poorly on these groups. Second, we develop a framework, CIRCSheets, for documenting the choices and assumptions in choosing racial categories and the process of racialization into these categories to facilitate transparency in understanding the processes and assumptions made by dataset or model developers when selecting or using these racial categories.
