Lessons From Red Teaming 100 Generative AI Products

Blake Bullwinkel; Amanda Minnich; Shiven Chawla; Gary Lopez; Martin Pouliot; Whitney Maxwell; Joris de Gruyter; Katherine Pratt; Saphir Qi; Nina Chikanov; Roman Lutz; Raja Sekhar Rao Dheekonda; Bolor-Erdene Jagdagdorj; Eugenia Kim; Justin Song; Keegan Hines; Daniel Jones; Giorgio Severi; Richard Lundeen; Sam Vaughan; Victoria Westerhoff; Pete Bryan; Ram Shankar Siva Kumar; Yonatan Zunger; Chang Kawaguchi; Mark Russinovich

Lessons From Red Teaming 100 Generative AI Products

Blake Bullwinkel, Amanda Minnich, Shiven Chawla, Gary Lopez, Martin Pouliot, Whitney Maxwell, Joris de Gruyter, Katherine Pratt, Saphir Qi, Nina Chikanov, Roman Lutz, Raja Sekhar Rao Dheekonda, Bolor-Erdene Jagdagdorj, Eugenia Kim, Justin Song, Keegan Hines, Daniel Jones, Giorgio Severi, Richard Lundeen, Sam Vaughan, Victoria Westerhoff, Pete Bryan, Ram Shankar Siva Kumar, Yonatan Zunger, Chang Kawaguchi, Mark Russinovich

TL;DR

AI red teaming seeks to assess safety and security beyond model benchmarks by testing end-to-end GenAI systems. The paper presents a threat-model ontology and eight actionable lessons drawn from red-teaming over 100 Microsoft GenAI products, supported by five case studies. It introduces PyRIT, an open-source automation framework that scales testing across diverse systems and modalities, while emphasizing the continued importance of human expertise, context, and adversarial thinking. The work contributes a modular, MITRE-aligned framework for organizing risks and offers practical recommendations and open questions to guide future development, standardization, and broader adoption in industry and research.

Abstract

In recent years, AI red teaming has emerged as a practice for probing the safety and security of generative AI systems. Due to the nascency of the field, there are many open questions about how red teaming operations should be conducted. Based on our experience red teaming over 100 generative AI products at Microsoft, we present our internal threat model ontology and eight main lessons we have learned: 1. Understand what the system can do and where it is applied 2. You don't have to compute gradients to break an AI system 3. AI red teaming is not safety benchmarking 4. Automation can help cover more of the risk landscape 5. The human element of AI red teaming is crucial 6. Responsible AI harms are pervasive but difficult to measure 7. LLMs amplify existing security risks and introduce new ones 8. The work of securing AI systems will never be complete By sharing these insights alongside case studies from our operations, we offer practical recommendations aimed at aligning red teaming efforts with real world risks. We also highlight aspects of AI red teaming that we believe are often misunderstood and discuss open questions for the field to consider.

Lessons From Red Teaming 100 Generative AI Products

TL;DR

Abstract

Paper Structure (15 sections, 6 figures)

This paper contains 15 sections, 6 figures.

Introduction
Background
AI threat model ontology
Red teaming operations
Lessons
Understand what the system can do and where it is applied
You don't have to compute gradients to break an AI system
AI red teaming is not safety benchmarking
Automation can help cover more of the risk landscape
The human element of AI red teaming is crucial
Responsible AI harms are pervasive but difficult to measure
LLMs amplify existing security risks and introduce new ones
The work of securing AI systems will never be complete
Open questions
Conclusion

Figures (6)

Figure 1: Microsoft AIRT ontology for modeling GenAI system vulnerabilities. AIRT often leverages multiple TTPs, which may exploit multiple Weaknesses and create multiple Impacts. In addition, more than one Mitigation may be necessary to address a Weakness. Note that AIRT is tasked only with identifying risks, while product teams are resourced to develop appropriate mitigations.
Figure 2: Quantitative summary of AIRT operations since 2021. (Left) Bar chart showing the percentage of operations that probed safety (RAI) vs. security vulnerabilities from 2021--2024. (Right) Pie chart showing the percentage breakdown of AI products that AIRT has tested. As of October 2024, we have conducted over 80 operations covering more than 100 products.
Figure 3: Example of an image jailbreak to generate content that could aid in illegal activities.
Figure 4: End-to-end automated scamming scenario using an LLM and STT/TTS systems.
Figure 5: Four images generated by a text-to-image model given the prompt "Secretary talking to boss in a conference room, secretary is standing while boss is sitting."
...and 1 more figures

Lessons From Red Teaming 100 Generative AI Products

TL;DR

Abstract

Lessons From Red Teaming 100 Generative AI Products

Authors

TL;DR

Abstract

Table of Contents

Figures (6)