Table of Contents
Fetching ...

AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems

Faouzi El Yagoubi, Ranwa Al Mallah, Godwin Badu-Marfo

TL;DR

AgentLeak presents the first full-stack benchmark to quantify privacy leakage across internal and external channels in multi-agent LLM systems. By evaluating 1,000 scenarios across healthcare, finance, legal, and corporate domains with a seven-channel taxonomy and a three-tier detection pipeline, it reveals that internal channels—especially inter-agent messages and shared memory—drive the majority of leakage and that output-only audits miss a large fraction of violations (~41.7%). The study analyzes five production LLMs, demonstrates a significant security-utility tradeoff, and demonstrates the need for privacy-aware coordination frameworks with internal-channel protections. The authors release a comprehensive dataset, detection pipeline, and SDK integrations to enable reproducible benchmarking and guide defenses in regulated deployments. These insights have practical implications for architecture design, compliance, and future benchmark development in multi-agent AI systems.

Abstract

Multi-agent Large Language Model (LLM) systems create privacy risks that current benchmarks cannot measure. When agents coordinate on tasks, sensitive data passes through inter-agent messages, shared memory, and tool arguments; pathways that output-only audits never inspect. We introduce AgentLeak, to the best of our knowledge the first full-stack benchmark for privacy leakage covering internal channels, spanning 1,000 scenarios across healthcare, finance, legal, and corporate domains, paired with a 32-class attack taxonomy and three-tier detection pipeline. Testing GPT-4o, GPT-4o-mini, Claude 3.5 Sonnet, Mistral Large, and Llama 3.3 70B across 4,979 traces reveals that multi-agent configurations reduce per-channel output leakage (C1: 27.2% vs 43.2% in single-agent) but introduce unmonitored internal channels that raise total system exposure to 68.9% (OR-aggregated across C1, C2, C5). Internal channels account for most of this gap: inter-agent messages (C2) leak at 68.8%, compared to 27.2% on C1 (output channel). This means that output-only audits miss 41.7% of violations. Claude 3.5 Sonnet, which emphasizes safety alignment in its design, achieves the lowest leakage rates on both external (3.3%) and internal (28.1%) channels, suggesting that model-level safety training may transfer to internal channel protection. Across all five models and four domains, the pattern C2 > C1 holds consistently, confirming that inter-agent communication is the primary vulnerability. These findings underscore the need for coordination frameworks that incorporate internal-channel privacy protections and enforce privacy controls on inter-agent communication.

AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems

TL;DR

AgentLeak presents the first full-stack benchmark to quantify privacy leakage across internal and external channels in multi-agent LLM systems. By evaluating 1,000 scenarios across healthcare, finance, legal, and corporate domains with a seven-channel taxonomy and a three-tier detection pipeline, it reveals that internal channels—especially inter-agent messages and shared memory—drive the majority of leakage and that output-only audits miss a large fraction of violations (~41.7%). The study analyzes five production LLMs, demonstrates a significant security-utility tradeoff, and demonstrates the need for privacy-aware coordination frameworks with internal-channel protections. The authors release a comprehensive dataset, detection pipeline, and SDK integrations to enable reproducible benchmarking and guide defenses in regulated deployments. These insights have practical implications for architecture design, compliance, and future benchmark development in multi-agent AI systems.

Abstract

Multi-agent Large Language Model (LLM) systems create privacy risks that current benchmarks cannot measure. When agents coordinate on tasks, sensitive data passes through inter-agent messages, shared memory, and tool arguments; pathways that output-only audits never inspect. We introduce AgentLeak, to the best of our knowledge the first full-stack benchmark for privacy leakage covering internal channels, spanning 1,000 scenarios across healthcare, finance, legal, and corporate domains, paired with a 32-class attack taxonomy and three-tier detection pipeline. Testing GPT-4o, GPT-4o-mini, Claude 3.5 Sonnet, Mistral Large, and Llama 3.3 70B across 4,979 traces reveals that multi-agent configurations reduce per-channel output leakage (C1: 27.2% vs 43.2% in single-agent) but introduce unmonitored internal channels that raise total system exposure to 68.9% (OR-aggregated across C1, C2, C5). Internal channels account for most of this gap: inter-agent messages (C2) leak at 68.8%, compared to 27.2% on C1 (output channel). This means that output-only audits miss 41.7% of violations. Claude 3.5 Sonnet, which emphasizes safety alignment in its design, achieves the lowest leakage rates on both external (3.3%) and internal (28.1%) channels, suggesting that model-level safety training may transfer to internal channel protection. Across all five models and four domains, the pattern C2 > C1 holds consistently, confirming that inter-agent communication is the primary vulnerability. These findings underscore the need for coordination frameworks that incorporate internal-channel privacy protections and enforce privacy controls on inter-agent communication.
Paper Structure (34 sections, 1 equation, 10 figures, 13 tables)

This paper contains 34 sections, 1 equation, 10 figures, 13 tables.

Figures (10)

  • Figure 1: Operational definitions for leakage and risk metrics used in this study.
  • Figure 2: AgentLeak framework-agnostic evaluation harness. The Software Development Kit (SDK) intercepts trace events across all seven channels (C1--C7), applies the three-tier detection pipeline, and produces standardized metrics regardless of the underlying multi-agent framework. This benchmark design enables consistent privacy evaluation across LangChain, CrewAI, AutoGPT, MetaGPT, and custom implementations.
  • Figure 3: AgentLeak's multi-agent system architecture showing the seven leakage channels. External channels (C1, C3, C4, C6, C7) operate at system boundaries where defenses can be applied. Internal channels (C2 inter-agent messages, C5 shared memory) facilitate agent coordination but lack default privacy protections in current frameworks.
  • Figure 4: Channel-by-channel leakage rates (n=4,979). C1 (final output): 27.2%. Internal channels show higher rates: C2 (inter-agent): 68.8%, C5 (memory): 46.7%. The pattern C2 $>$ C1 holds across all five models and four domains.
  • Figure 5: Output-only audit gap (n=4,979 traces). 41.7% of violations occur in internal channels while final output passes checks---audits inspecting only C1 miss nearly half of all violations.
  • ...and 5 more figures