Table of Contents
Fetching ...

OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage

Akshat Naik, Jay Culligan, Yarin Gal, Philip Torr, Rahaf Aljundi, Alasdair Paren, Adel Bibi

TL;DR

The paper addresses data leakage risks in orchestrator-based multi-agent systems by introducing OMNI-LEAK, a red-team attack that coordinates a Data Processing Agent using SQL and a Notification Agent within an orchestrator. The authors formalize the threat model, develop a data-leak benchmark across Toy/Medium/Big databases with public and private data sources $D_{pub}$ and $D_{priv}$, and evaluate five frontier LLMs using metrics such as $BA$, $RA$, and $E$ (the expected number of queries for a successful attack). They find that all models except Claude-sonnet-4 are vulnerable to at least one OMNI-LEAK variant, with database size having little effect on attack success and downstream exposure often driving vulnerability. The work highlights practical privacy risks in real-world data-management orchestrators and calls for safety research and defense-in-depth strategies, including monitoring at multiple stages and human oversight to mitigate such attacks in deployment.

Abstract

As Large Language Model (LLM) agents become more capable, their coordinated use in the form of multi-agent systems is anticipated to emerge as a practical paradigm. Prior work has examined the safety and misuse risks associated with agents. However, much of this has focused on the single-agent case and/or setups missing basic engineering safeguards such as access control, revealing a scarcity of threat modeling in multi-agent systems. We investigate the security vulnerabilities of a popular multi-agent pattern known as the orchestrator setup, in which a central agent decomposes and delegates tasks to specialized agents. Through red-teaming a concrete setup representative of a likely future use case, we demonstrate a novel attack vector, OMNI-LEAK, that compromises several agents to leak sensitive data through a single indirect prompt injection, even in the \textit{presence of data access control}. We report the susceptibility of frontier models to different categories of attacks, finding that both reasoning and non-reasoning models are vulnerable, even when the attacker lacks insider knowledge of the implementation details. Our work highlights the importance of safety research to generalize from single-agent to multi-agent settings, in order to reduce the serious risks of real-world privacy breaches and financial losses and overall public trust in AI agents.

OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage

TL;DR

The paper addresses data leakage risks in orchestrator-based multi-agent systems by introducing OMNI-LEAK, a red-team attack that coordinates a Data Processing Agent using SQL and a Notification Agent within an orchestrator. The authors formalize the threat model, develop a data-leak benchmark across Toy/Medium/Big databases with public and private data sources and , and evaluate five frontier LLMs using metrics such as , , and (the expected number of queries for a successful attack). They find that all models except Claude-sonnet-4 are vulnerable to at least one OMNI-LEAK variant, with database size having little effect on attack success and downstream exposure often driving vulnerability. The work highlights practical privacy risks in real-world data-management orchestrators and calls for safety research and defense-in-depth strategies, including monitoring at multiple stages and human oversight to mitigate such attacks in deployment.

Abstract

As Large Language Model (LLM) agents become more capable, their coordinated use in the form of multi-agent systems is anticipated to emerge as a practical paradigm. Prior work has examined the safety and misuse risks associated with agents. However, much of this has focused on the single-agent case and/or setups missing basic engineering safeguards such as access control, revealing a scarcity of threat modeling in multi-agent systems. We investigate the security vulnerabilities of a popular multi-agent pattern known as the orchestrator setup, in which a central agent decomposes and delegates tasks to specialized agents. Through red-teaming a concrete setup representative of a likely future use case, we demonstrate a novel attack vector, OMNI-LEAK, that compromises several agents to leak sensitive data through a single indirect prompt injection, even in the \textit{presence of data access control}. We report the susceptibility of frontier models to different categories of attacks, finding that both reasoning and non-reasoning models are vulnerable, even when the attacker lacks insider knowledge of the implementation details. Our work highlights the importance of safety research to generalize from single-agent to multi-agent settings, in order to reduce the serious risks of real-world privacy breaches and financial losses and overall public trust in AI agents.
Paper Structure (36 sections, 14 equations, 4 figures, 24 tables)

This paper contains 36 sections, 14 equations, 4 figures, 24 tables.

Figures (4)

  • Figure 1: Illustration of an Orchestrator System. The user interacts with the orchestrator agent, which in turn has access to many downstream agents, each with their own system prompts and tools, designed according to their specialized function.
  • Figure 2: Base orchestrator setup involving SQL Agent and Notification Agent. The base setup is shown on the left. On the right, OMNI-LEAK bypasses the setup's access control safeguards. The adversary initiates by inserting an indirect prompt injection into public data. This hijacks the SQL agent, which in turn compromises the orchestrator and Notification Agent to exfiltrate sensitive data.
  • Figure 3: Example sequence of OMNI-LEAK attack execution. After the adversary inserts the attack, the user starts by asking a benign query about Mark. The orchestrator attempts to answer it using the SQL agent, which then encounters the attack and gets hijacked. It retrieves the SSNs and instructs the orchestrator to send it to the adversary, who in turn instructs the Notification agent.
  • Figure 4: Automatic Evaluation. The attack is automatically assessed to be successful or not using keyword matching for expected data.