Table of Contents
Fetching ...

AI red-teaming is a sociotechnical problem: on values, labor, and harms

Tarleton Gillespie, Ryland Shaw, Mary L. Gray, Jina Suh

TL;DR

This paper argues that AI red-teaming is not just a technical evaluation procedure but a sociotechnical labor practice embedded with value judgments and governance challenges. It reviews the historical roots of red-teaming, contrasts internal, outsourced, and volunteer labor models, and links harms taxonomy to content moderation experiences. The authors call for a coordinated, cross-disciplinary research program to study red-teaming: its labor arrangements, ethical implications, and well-being impacts, aiming to improve safety, accountability, and worker protections. The work emphasizes avoiding opaque, performative safety practices and suggests building a public-facing understanding that aligns industry incentives with societal values.

Abstract

As generative AI technologies find more and more real-world applications, the importance of testing their performance and safety seems paramount. "Red-teaming" has quickly become the primary approach to test AI models--prioritized by AI companies, and enshrined in AI policy and regulation. Members of red teams act as adversaries, probing AI systems to test their safety mechanisms and uncover vulnerabilities. Yet we know far too little about this work or its implications. This essay calls for collaboration between computer scientists and social scientists to study the sociotechnical systems surrounding AI technologies, including the work of red-teaming, to avoid repeating the mistakes of the recent past. We highlight the importance of understanding the values and assumptions behind red-teaming, the labor arrangements involved, and the psychological impacts on red-teamers, drawing insights from the lessons learned around the work of content moderation.

AI red-teaming is a sociotechnical problem: on values, labor, and harms

TL;DR

This paper argues that AI red-teaming is not just a technical evaluation procedure but a sociotechnical labor practice embedded with value judgments and governance challenges. It reviews the historical roots of red-teaming, contrasts internal, outsourced, and volunteer labor models, and links harms taxonomy to content moderation experiences. The authors call for a coordinated, cross-disciplinary research program to study red-teaming: its labor arrangements, ethical implications, and well-being impacts, aiming to improve safety, accountability, and worker protections. The work emphasizes avoiding opaque, performative safety practices and suggests building a public-facing understanding that aligns industry incentives with societal values.

Abstract

As generative AI technologies find more and more real-world applications, the importance of testing their performance and safety seems paramount. "Red-teaming" has quickly become the primary approach to test AI models--prioritized by AI companies, and enshrined in AI policy and regulation. Members of red teams act as adversaries, probing AI systems to test their safety mechanisms and uncover vulnerabilities. Yet we know far too little about this work or its implications. This essay calls for collaboration between computer scientists and social scientists to study the sociotechnical systems surrounding AI technologies, including the work of red-teaming, to avoid repeating the mistakes of the recent past. We highlight the importance of understanding the values and assumptions behind red-teaming, the labor arrangements involved, and the psychological impacts on red-teamers, drawing insights from the lessons learned around the work of content moderation.

Paper Structure

This paper contains 5 sections.