Table of Contents
Fetching ...

DREAM: Dynamic Red-teaming across Environments for AI Models

Liming Lu, Xiang Gu, Junyu Huang, Jiawei Du, Yunhuai Liu, Yongbin Zhou, Shuchao Pang

TL;DR

DREAM offers an automated, cross-environment red-teaming framework that unifies a Cross‑Environment Adversarial Knowledge Graph (CE‑AKG) with Contextualized Guided Policy Search (C‑GPS) to construct long, causally-linked attack chains against LLM-powered agents. The framework exposes systemic vulnerabilities—particularly contextual fragility and the inability to track malicious intent over multi-step interactions—undetected by static, single-environment evaluations. Through a large-scale evaluation of 12 agents across 349 environments and 1,986 atom attacks, DREAM demonstrates a domino effect where attack efficacy grows super-linearly with chain length and cross-environment pivots. Static defenses prove ineffective against such dynamic, stateful threats, underscoring the need for sophisticated, context-aware safety strategies and providing a reproducible benchmark for advancing agent safety research.

Abstract

Large Language Models (LLMs) are increasingly used in agentic systems, where their interactions with diverse tools and environments create complex, multi-stage safety challenges. However, existing benchmarks mostly rely on static, single-turn assessments that miss vulnerabilities from adaptive, long-chain attacks. To fill this gap, we introduce DREAM, a framework for systematic evaluation of LLM agents against dynamic, multi-stage attacks. At its core, DREAM uses a Cross-Environment Adversarial Knowledge Graph (CE-AKG) to maintain stateful, cross-domain understanding of vulnerabilities. This graph guides a Contextualized Guided Policy Search (C-GPS) algorithm that dynamically constructs attack chains from a knowledge base of 1,986 atomic actions across 349 distinct digital environments. Our evaluation of 12 leading LLM agents reveals a critical vulnerability: these attack chains succeed in over 70% of cases for most models, showing the power of stateful, cross-environment exploits. Through analysis of these failures, we identify two key weaknesses in current agents: contextual fragility, where safety behaviors fail to transfer across environments, and an inability to track long-term malicious intent. Our findings also show that traditional safety measures, such as initial defense prompts, are largely ineffective against attacks that build context over multiple interactions. To advance agent safety research, we release DREAM as a tool for evaluating vulnerabilities and developing more robust defenses.

DREAM: Dynamic Red-teaming across Environments for AI Models

TL;DR

DREAM offers an automated, cross-environment red-teaming framework that unifies a Cross‑Environment Adversarial Knowledge Graph (CE‑AKG) with Contextualized Guided Policy Search (C‑GPS) to construct long, causally-linked attack chains against LLM-powered agents. The framework exposes systemic vulnerabilities—particularly contextual fragility and the inability to track malicious intent over multi-step interactions—undetected by static, single-environment evaluations. Through a large-scale evaluation of 12 agents across 349 environments and 1,986 atom attacks, DREAM demonstrates a domino effect where attack efficacy grows super-linearly with chain length and cross-environment pivots. Static defenses prove ineffective against such dynamic, stateful threats, underscoring the need for sophisticated, context-aware safety strategies and providing a reproducible benchmark for advancing agent safety research.

Abstract

Large Language Models (LLMs) are increasingly used in agentic systems, where their interactions with diverse tools and environments create complex, multi-stage safety challenges. However, existing benchmarks mostly rely on static, single-turn assessments that miss vulnerabilities from adaptive, long-chain attacks. To fill this gap, we introduce DREAM, a framework for systematic evaluation of LLM agents against dynamic, multi-stage attacks. At its core, DREAM uses a Cross-Environment Adversarial Knowledge Graph (CE-AKG) to maintain stateful, cross-domain understanding of vulnerabilities. This graph guides a Contextualized Guided Policy Search (C-GPS) algorithm that dynamically constructs attack chains from a knowledge base of 1,986 atomic actions across 349 distinct digital environments. Our evaluation of 12 leading LLM agents reveals a critical vulnerability: these attack chains succeed in over 70% of cases for most models, showing the power of stateful, cross-environment exploits. Through analysis of these failures, we identify two key weaknesses in current agents: contextual fragility, where safety behaviors fail to transfer across environments, and an inability to track long-term malicious intent. Our findings also show that traditional safety measures, such as initial defense prompts, are largely ineffective against attacks that build context over multiple interactions. To advance agent safety research, we release DREAM as a tool for evaluating vulnerabilities and developing more robust defenses.

Paper Structure

This paper contains 61 sections, 6 equations, 7 figures, 10 tables, 1 algorithm.

Figures (7)

  • Figure 1: Comparison between Traditional Benchmarks and Our DREAM. Left: Traditional agent-safety benchmarks evaluate a model within a single environment using static, template-based attacks, offering limited coverage of realistic adversarial behavior. Right: Our proposed DREAM introduces a multi-agent attacker to evaluate models across diverse environments. A centralized Conductor performs cross-environment reasoning, while the Rater and Sandbox jointly evaluate and update attack states. This design enables dynamic, adaptive, multi-environment attack chains, uncovering vulnerabilities that static and single-environment tests fail to reveal.
  • Figure 2: Overview of our DREAM framework. Left: Multi-Agent Atom Attack Generation, where Scout, Seeder, and Exploiter roles create diverse atom attacks for the Atom Attack Library. Center: The Cross-Environment Chain Attack powered by Contextualized Guided Policy Search (C-GPS), with the Conductor agent dynamically planning attack paths through retrieval, clustering, action selection, and failure-aware backtracking. Right: A case study illustrating a three-step attack, leveraging information from one environment to initiate a cross-environment exploit, facilitated by the Cross-Environment Adversarial Knowledge Graph (CE-AKG) to trigger a domino effect and uncover systemic vulnerabilities.
  • Figure 3: "Domino Effect" in Attack Chain Length and Final Score Distribution. The mean score (solid line) shows steep, super-linear growth, significantly outperforming the exponential baseline (dashed line). This divergence is a direct result of the C-GPS algorithm’s ability to construct causally-linked attack sequences, demonstrating that the potency of attacks increases synergistically with chain length.
  • Figure 4: The "Information Bridge" Effect in Attack Chain Length and Environment Count. The mean score (solid line) shows near-linear growth as attacks traverse more environments. This demonstrates the effectiveness of our CE-AKG in fusing disparate information to enable contextually rich, high-impact attacks. The widening distribution of scores (box plots) at higher environment counts indicates that cross-environment attacks unlock opportunities for more severe breaches.
  • Figure 5: Ablation Study on Conductor Capability. This figure plots the final scores against attack chain length for varied LLM Conductors. The results demonstrate that the positive correlation between chain length and attack severity is a consistent trend across all models, confirming the universality of the "domino effect." However, the magnitude of this growth exhibits clear stratification, where more capable models achieve significantly steeper, super-linear trajectories compared to their less advanced counterparts.
  • ...and 2 more figures