Table of Contents
Fetching ...

$\textit{Agents Under Siege}$: Breaking Pragmatic Multi-Agent LLM Systems with Optimized Prompt Attacks

Rana Muhammad Shahroz Khan, Zhen Tan, Sukwon Yun, Charles Fleming, Tianlong Chen

TL;DR

This work reveals critical safety vulnerabilities in pragmatic multi-agent LLM systems where token bandwidth and asynchronous messaging constrain communication. It introduces a permutation-invariant adversarial attack built on a $\text{Maximum-Flow Minimum-Cost}$ framework and a novel $\text{Permutation-Invariant Evasion Loss (PIEL)}$, with a stochastic variant $\text{S-PIEL}$ to balance compute and effectiveness. Across diverse models and benchmarks, the attack achieves up to the order of $10^1$× improvements over naive prompts (e.g., up to $94\%$ ASR reported) and remains effective against several safety defenses, highlighting the urgency for multi-agent–specific defenses. The results emphasize the significance of network topology, latency, and edge-level defenses in shaping adversarial risk in distributed LLM systems and motivate future work on robust, topology-aware safety mechanisms for multi-agent deployments.

Abstract

Most discussions about Large Language Model (LLM) safety have focused on single-agent settings but multi-agent LLM systems now create novel adversarial risks because their behavior depends on communication between agents and decentralized reasoning. In this work, we innovatively focus on attacking pragmatic systems that have constrains such as limited token bandwidth, latency between message delivery, and defense mechanisms. We design a $\textit{permutation-invariant adversarial attack}$ that optimizes prompt distribution across latency and bandwidth-constraint network topologies to bypass distributed safety mechanisms within the system. Formulating the attack path as a problem of $\textit{maximum-flow minimum-cost}$, coupled with the novel $\textit{Permutation-Invariant Evasion Loss (PIEL)}$, we leverage graph-based optimization to maximize attack success rate while minimizing detection risk. Evaluating across models including $\texttt{Llama}$, $\texttt{Mistral}$, $\texttt{Gemma}$, $\texttt{DeepSeek}$ and other variants on various datasets like $\texttt{JailBreakBench}$ and $\texttt{AdversarialBench}$, our method outperforms conventional attacks by up to $7\times$, exposing critical vulnerabilities in multi-agent systems. Moreover, we demonstrate that existing defenses, including variants of $\texttt{Llama-Guard}$ and $\texttt{PromptGuard}$, fail to prohibit our attack, emphasizing the urgent need for multi-agent specific safety mechanisms.

$\textit{Agents Under Siege}$: Breaking Pragmatic Multi-Agent LLM Systems with Optimized Prompt Attacks

TL;DR

This work reveals critical safety vulnerabilities in pragmatic multi-agent LLM systems where token bandwidth and asynchronous messaging constrain communication. It introduces a permutation-invariant adversarial attack built on a framework and a novel , with a stochastic variant to balance compute and effectiveness. Across diverse models and benchmarks, the attack achieves up to the order of × improvements over naive prompts (e.g., up to ASR reported) and remains effective against several safety defenses, highlighting the urgency for multi-agent–specific defenses. The results emphasize the significance of network topology, latency, and edge-level defenses in shaping adversarial risk in distributed LLM systems and motivate future work on robust, topology-aware safety mechanisms for multi-agent deployments.

Abstract

Most discussions about Large Language Model (LLM) safety have focused on single-agent settings but multi-agent LLM systems now create novel adversarial risks because their behavior depends on communication between agents and decentralized reasoning. In this work, we innovatively focus on attacking pragmatic systems that have constrains such as limited token bandwidth, latency between message delivery, and defense mechanisms. We design a that optimizes prompt distribution across latency and bandwidth-constraint network topologies to bypass distributed safety mechanisms within the system. Formulating the attack path as a problem of , coupled with the novel , we leverage graph-based optimization to maximize attack success rate while minimizing detection risk. Evaluating across models including , , , and other variants on various datasets like and , our method outperforms conventional attacks by up to , exposing critical vulnerabilities in multi-agent systems. Moreover, we demonstrate that existing defenses, including variants of and , fail to prohibit our attack, emphasizing the urgent need for multi-agent specific safety mechanisms.

Paper Structure

This paper contains 43 sections, 8 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Adversarial attack in multi-agent LLM systems. Top: Network topology showing communication between agents and the targeted string attack flow from source to target. Bottom: Comparison between existing approaches that fail under constraints and our method using MFMC problem formulation and Permutation-Invariant Loss, which successfully bypasses safety mechanisms while respecting constraints.
  • Figure 2: Process of generating and optimizing adversarial prompt chunks for multi-agent LLM systems. (a) Multi-agent Topologies: Different network structures including Chain, Tree, Random Graph, and Complete Graph that influence attack effectiveness. (b) Topological Optimization: Identifying optimal paths based on bandwidth constraints and detection risk, with chunks strategically distributed across the network. (c) Permutation Invariance: Due to network latency, prompt chunks may arrive in different orders, creating a sampling space where optimized chunks remain effective regardless of arrival sequence, successfully bypassing safety mechanisms.
  • Figure 3: Detection efficacy of different safety mechanisms against adversarial prompts.
  • Figure 4: Impact of different network topologies on attack success rate (ASR) in a multi-agent LLM system.
  • Figure 5: Effect of chunk length on the detection of PromptGuard-86M and its 4-bit quantized version.