DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling

Boheng Li; Junjie Wang; Yiming Li; Zhiyang Hu; Leyi Qi; Jianshuo Dong; Run Wang; Han Qiu; Zhan Qin; Tianwei Zhang

DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling

Boheng Li, Junjie Wang, Yiming Li, Zhiyang Hu, Leyi Qi, Jianshuo Dong, Run Wang, Han Qiu, Zhan Qin, Tianwei Zhang

TL;DR

DREAM reframes red-teaming for text-to-image models as distribution learning over unsafe prompts, enabling scalable and diverse discovery via energy-based modeling. It introduces GC-SPSA, a gradient-calibrated zeroth-order optimizer, and an inference-time adaptive temperature strategy to efficiently sample a broad unsafe prompt space. Through extensive cross-model and cross-filter evaluation, DREAM achieves superior prompt success rates while maintaining diversity comparable to human-written prompts, and demonstrates transferability to commercial platforms. The framework also supports safety tuning and offers insights into reusability of the red-team LLM across targets. Overall, DREAM provides a principled, scalable pathway to rigorously evaluate and strengthen the safety of T2I systems before real-world deployment.

Abstract

Despite the integration of safety alignment and external filters, text-to-image (T2I) generative systems are still susceptible to producing harmful content, such as sexual or violent imagery. This raises serious concerns about unintended exposure and potential misuse. Red teaming, which aims to proactively identify diverse prompts that can elicit unsafe outputs from the T2I system, is increasingly recognized as an essential method for assessing and improving safety before real-world deployment. However, existing automated red teaming approaches often treat prompt discovery as an isolated, prompt-level optimization task, which limits their scalability, diversity, and overall effectiveness. To bridge this gap, in this paper, we propose DREAM, a scalable red teaming framework to automatically uncover diverse problematic prompts from a given T2I system. Unlike prior work that optimizes prompts individually, DREAM directly models the probabilistic distribution of the target system's problematic prompts, which enables explicit optimization over both effectiveness and diversity, and allows efficient large-scale sampling after training. To achieve this without direct access to representative training samples, we draw inspiration from energy-based models and reformulate the objective into a simple and tractable form. We further introduce GC-SPSA, an efficient optimization algorithm that provides stable gradient estimates through the long and potentially non-differentiable T2I pipeline. During inference, we also propose a diversity-aware sampling strategy to enhance prompt variety. The effectiveness of DREAM is validated through extensive experiments, demonstrating state-of-the-art performance across a wide range of T2I models and safety filters in terms of both prompt success rate and diversity. Our code is available at https://github.com/AntigoneRandy/DREAM

DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling

TL;DR

Abstract

DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (4)