Table of Contents
Fetching ...

Beyond Behaviorist Representational Harms: A Plan for Measurement and Mitigation

Jennifer Chien, David Danks

TL;DR

The paper addresses the gap in fairness research by moving beyond allocative harms to a broader, non-behaviorist account of representational harms that affect cognitive, affective, and emotional states. It proposes a practical measurement framework, grounded in psychology and participatory methods, and demonstrates its application through a case study of large language models in conversational contexts. Key contributions include a taxonomy of non-behaviorist harms, guidance on high-level measurement requirements, and mitigations such as seamful design and counter-narratives, plus consideration of morally defensible cases. The work aims to translate fairness insights into actionable measurement and mitigation praxis for real-world AI systems, highlighting LLMs as particularly vulnerable to unmeasured harms and the need for accountability throughout deployment.

Abstract

Algorithmic harms are commonly categorized as either allocative or representational. This study specifically addresses the latter, focusing on an examination of current definitions of representational harms to discern what is included and what is not. This analysis motivates our expansion beyond behavioral definitions to encompass harms to cognitive and affective states. The paper outlines high-level requirements for measurement: identifying the necessary expertise to implement this approach and illustrating it through a case study. Our work highlights the unique vulnerabilities of large language models to perpetrating representational harms, particularly when these harms go unmeasured and unmitigated. The work concludes by presenting proposed mitigations and delineating when to employ them. The overarching aim of this research is to establish a framework for broadening the definition of representational harms and to translate insights from fairness research into practical measurement and mitigation praxis.

Beyond Behaviorist Representational Harms: A Plan for Measurement and Mitigation

TL;DR

The paper addresses the gap in fairness research by moving beyond allocative harms to a broader, non-behaviorist account of representational harms that affect cognitive, affective, and emotional states. It proposes a practical measurement framework, grounded in psychology and participatory methods, and demonstrates its application through a case study of large language models in conversational contexts. Key contributions include a taxonomy of non-behaviorist harms, guidance on high-level measurement requirements, and mitigations such as seamful design and counter-narratives, plus consideration of morally defensible cases. The work aims to translate fairness insights into actionable measurement and mitigation praxis for real-world AI systems, highlighting LLMs as particularly vulnerable to unmeasured harms and the need for accountability throughout deployment.

Abstract

Algorithmic harms are commonly categorized as either allocative or representational. This study specifically addresses the latter, focusing on an examination of current definitions of representational harms to discern what is included and what is not. This analysis motivates our expansion beyond behavioral definitions to encompass harms to cognitive and affective states. The paper outlines high-level requirements for measurement: identifying the necessary expertise to implement this approach and illustrating it through a case study. Our work highlights the unique vulnerabilities of large language models to perpetrating representational harms, particularly when these harms go unmeasured and unmitigated. The work concludes by presenting proposed mitigations and delineating when to employ them. The overarching aim of this research is to establish a framework for broadening the definition of representational harms and to translate insights from fairness research into practical measurement and mitigation praxis.
Paper Structure (23 sections, 2 figures, 4 tables)