T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

Hyomin Lee; Sangwoo Park; Yumin Choi; Sohyun An; Seanie Lee; Sung Ju Hwang

T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

Hyomin Lee, Sangwoo Park, Yumin Choi, Sohyun An, Seanie Lee, Sung Ju Hwang

Abstract

While prior red-teaming efforts have focused on eliciting harmful text outputs from large language models (LLMs), such approaches fail to capture agent-specific vulnerabilities that emerge through multi-step tool execution, particularly in rapidly growing ecosystems such as the Model Context Protocol (MCP). To address this gap, we propose a trajectory-aware evolutionary search method, T-MAP, which leverages execution trajectories to guide the discovery of adversarial prompts. Our approach enables the automatic generation of attacks that not only bypass safety guardrails but also reliably realize harmful objectives through actual tool interactions. Empirical evaluations across diverse MCP environments demonstrate that T-MAP substantially outperforms baselines in attack realization rate (ARR) and remains effective against frontier models, including GPT-5.2, Gemini-3-Pro, Qwen3.5, and GLM-5, thereby revealing previously underexplored vulnerabilities in autonomous LLM agents.

T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

Abstract

Paper Structure (45 sections, 3 equations, 33 figures, 8 tables)

This paper contains 45 sections, 3 equations, 33 figures, 8 tables.

Introduction
Related Work
Automated red-teaming.
Diversity-driven vulnerability discovery.
Safety and security of LLM agents.
Preliminaries
Red-teaming LLM agents.
Automated red-teaming via MAP-Elites.
T-MAP
Initialization.
Trajectory-guided mutation.
Evaluation and update.
Experiment
Experimental Setup
Environments.
...and 30 more sections

Figures (33)

Figure 1: Comparison between (top) chat-based LLM red-teaming and (bottom) LLM agents red-teaming.
Figure 2: Overview of T-MAP. Each iteration consists of four steps: (1) the $\texttt{LLM}_\textbf{Analyst}$ diagnoses success factors and failure causes from a parent-target cell pair, (2) the $\texttt{LLM}_\textbf{Mutator}$ generates a new prompt using these diagnostics and the Tool Call Graph (TCG), (3) the $\texttt{LLM}_\textbf{TCG}$ extracts edge-level outcomes from the execution trajectory to update the TCG, and (4) the $\texttt{LLM}_\textbf{Judge}$ evaluates the trajectory to update the archive.
Figure 3: Distribution of attack success levels across five different MCP environments.
Figure 4: ARR and RR over iterations (averaged across $5$ MCP environments, with $95\%$ confidence intervals shaded). See \ref{['appendix:trend_all']} for per-environment details.
Figure 5: Archive coverage heatmaps combined across 5 MCP environments. Each plot shows the average success level ($L_0$ to $L_3$) for cell $(c, s) \in \mathcal{C} \times \mathcal{S}$. Per-environment results are provided in \ref{['appendix:heatmap_grid_5x5']}.
...and 28 more figures

T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

Abstract

T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

Authors

Abstract

Table of Contents

Figures (33)