Table of Contents
Fetching ...

PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI System

Gary D. Lopez Munoz, Amanda J. Minnich, Roman Lutz, Richard Lundeen, Raja Sekhar Rao Dheekonda, Nina Chikanov, Bolor-Erdene Jagdagdorj, Martin Pouliot, Shiven Chawla, Whitney Maxwell, Blake Bullwinkel, Katherine Pratt, Joris de Gruyter, Charlotte Siska, Pete Bryan, Tori Westerhoff, Chang Kawaguchi, Christian Seifert, Ram Shankar Siva Kumar, Yonatan Zunger

TL;DR

The challenges specific to red teaming generative AI systems are detailed, the development and features of PyRIT are detailed, and its practical applications in real-world scenarios are detailed.

Abstract

Generative Artificial Intelligence (GenAI) is becoming ubiquitous in our daily lives. The increase in computational power and data availability has led to a proliferation of both single- and multi-modal models. As the GenAI ecosystem matures, the need for extensible and model-agnostic risk identification frameworks is growing. To meet this need, we introduce the Python Risk Identification Toolkit (PyRIT), an open-source framework designed to enhance red teaming efforts in GenAI systems. PyRIT is a model- and platform-agnostic tool that enables red teamers to probe for and identify novel harms, risks, and jailbreaks in multimodal generative AI models. Its composable architecture facilitates the reuse of core building blocks and allows for extensibility to future models and modalities. This paper details the challenges specific to red teaming generative AI systems, the development and features of PyRIT, and its practical applications in real-world scenarios.

PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI System

TL;DR

The challenges specific to red teaming generative AI systems are detailed, the development and features of PyRIT are detailed, and its practical applications in real-world scenarios are detailed.

Abstract

Generative Artificial Intelligence (GenAI) is becoming ubiquitous in our daily lives. The increase in computational power and data availability has led to a proliferation of both single- and multi-modal models. As the GenAI ecosystem matures, the need for extensible and model-agnostic risk identification frameworks is growing. To meet this need, we introduce the Python Risk Identification Toolkit (PyRIT), an open-source framework designed to enhance red teaming efforts in GenAI systems. PyRIT is a model- and platform-agnostic tool that enables red teamers to probe for and identify novel harms, risks, and jailbreaks in multimodal generative AI models. Its composable architecture facilitates the reuse of core building blocks and allows for extensibility to future models and modalities. This paper details the challenges specific to red teaming generative AI systems, the development and features of PyRIT, and its practical applications in real-world scenarios.
Paper Structure (26 sections, 3 figures, 2 tables)

This paper contains 26 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of PyRIT components. Interfaces the platform provides are shown on the left. Concrete implementations of those interfaces shown on the right.
  • Figure 2: Reference PyRIT architecture for the red team orchestrator (Adversarial Orchestrator. The user-provided prompts is used to prime the adversarial model and starting the conversation. The orchestrator caller can also specify prompt converters to increase the diversity of the attacks. Both the adversarial model and target models make API calls to their respective endpoints to generate responses.
  • Figure 3: Comparison of high-risk responses generated by Phi-3 language models before and after several rounds of the “break-fix” cycle. Note that percentages are inflated because prompts used by the AI Red Team were crafted to elicit harmful generations.