Table of Contents
Fetching ...

Red Teaming AI Red Teaming

Subhabrata Majumdar, Brian Pendleton, Abhishek Gupta

TL;DR

The paper identifies a critical gap in AI red teaming: current efforts overemphasize model-specific vulnerabilities at the expense of sociotechnical and lifecycle-wide risks. It proposes a two-level framework—macro-system red teaming across the full AI development lifecycle and micro-model red teaming—to comprehensively surface and mitigate emergent risks. Drawing on cybersecurity practices and systems theory, it offers six recommendations, including a systems-theoretic perspective, TEVV integration, coordinated disclosure, bidirectional feedback, threat modeling for emergent risks, and behavioral drift monitoring, illustrated by a Healthcare AI deployment case. The work advocates for cross-functional, iterative, and continuous evaluation to improve governance, safety, and reliability of AI systems in real-world environments.

Abstract

Red teaming has evolved from its origins in military applications to become a widely adopted methodology in cybersecurity and AI. In this paper, we take a critical look at the practice of AI red teaming. We argue that despite its current popularity in AI governance, there exists a significant gap between red teaming's original intent as a critical thinking exercise and its narrow focus on discovering model-level flaws in the context of generative AI. Current AI red teaming efforts focus predominantly on individual model vulnerabilities while overlooking the broader sociotechnical systems and emergent behaviors that arise from complex interactions between models, users, and environments. To address this deficiency, we propose a comprehensive framework operationalizing red teaming in AI systems at two levels: macro-level system red teaming spanning the entire AI development lifecycle, and micro-level model red teaming. Drawing on cybersecurity experience and systems theory, we further propose a set of six recommendations. In these, we emphasize that effective AI red teaming requires multifunctional teams that examine emergent risks, systemic vulnerabilities, and the interplay between technical and social factors.

Red Teaming AI Red Teaming

TL;DR

The paper identifies a critical gap in AI red teaming: current efforts overemphasize model-specific vulnerabilities at the expense of sociotechnical and lifecycle-wide risks. It proposes a two-level framework—macro-system red teaming across the full AI development lifecycle and micro-model red teaming—to comprehensively surface and mitigate emergent risks. Drawing on cybersecurity practices and systems theory, it offers six recommendations, including a systems-theoretic perspective, TEVV integration, coordinated disclosure, bidirectional feedback, threat modeling for emergent risks, and behavioral drift monitoring, illustrated by a Healthcare AI deployment case. The work advocates for cross-functional, iterative, and continuous evaluation to improve governance, safety, and reliability of AI systems in real-world environments.

Abstract

Red teaming has evolved from its origins in military applications to become a widely adopted methodology in cybersecurity and AI. In this paper, we take a critical look at the practice of AI red teaming. We argue that despite its current popularity in AI governance, there exists a significant gap between red teaming's original intent as a critical thinking exercise and its narrow focus on discovering model-level flaws in the context of generative AI. Current AI red teaming efforts focus predominantly on individual model vulnerabilities while overlooking the broader sociotechnical systems and emergent behaviors that arise from complex interactions between models, users, and environments. To address this deficiency, we propose a comprehensive framework operationalizing red teaming in AI systems at two levels: macro-level system red teaming spanning the entire AI development lifecycle, and micro-level model red teaming. Drawing on cybersecurity experience and systems theory, we further propose a set of six recommendations. In these, we emphasize that effective AI red teaming requires multifunctional teams that examine emergent risks, systemic vulnerabilities, and the interplay between technical and social factors.

Paper Structure

This paper contains 32 sections, 1 figure.

Figures (1)

  • Figure 1: Technical and nontechnical components of a ML system, with components having potential trust considerations marked by blue circles. Reproduced with permission from pruksachatkun2022practicing.