Table of Contents
Fetching ...

Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents

Juhee Kim, Woohyuk Choi, Byoungyoung Lee

TL;DR

The paper tackles privilege escalation in LLM agents arising from processing untrusted data and prompts. It proposes Prompt Flow Integrity (PFI), a three-pronged security framework combining agent isolation, secure untrusted data processing with data IDs, and privilege escalation guardrails, underpinned by a data-trust policy. Through extensive evaluation on AgentDojo and AgentBench OS across multiple LLMs, PFI demonstrates significant improvements in Secure Utility Rate (SUR) and a zero Attacked Task Rate (ATR) compared to baselines, IsolateGPT, and f-secure, albeit with notable computational overhead. The work contributes a concrete, open-source architecture and a policy-driven approach to balance security and utility in real-world LLM agent deployments.

Abstract

Large Language Models (LLMs) are combined with tools to create powerful LLM agents that provide a wide range of services. Unlike traditional software, LLM agent's behavior is determined at runtime by natural language prompts from either user or tool's data. This flexibility enables a new computing paradigm with unlimited capabilities and programmability, but also introduces new security risks, vulnerable to privilege escalation attacks. Moreover, user prompts are prone to be interpreted in an insecure way by LLM agents, creating non-deterministic behaviors that can be exploited by attackers. To address these security risks, we propose Prompt Flow Integrity (PFI), a system security-oriented solution to prevent privilege escalation in LLM agents. Analyzing the architectural characteristics of LLM agents, PFI features three mitigation techniques -- i.e., agent isolation, secure untrusted data processing, and privilege escalation guardrails. Our evaluation result shows that PFI effectively mitigates privilege escalation attacks while successfully preserving the utility of LLM agents.

Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents

TL;DR

The paper tackles privilege escalation in LLM agents arising from processing untrusted data and prompts. It proposes Prompt Flow Integrity (PFI), a three-pronged security framework combining agent isolation, secure untrusted data processing with data IDs, and privilege escalation guardrails, underpinned by a data-trust policy. Through extensive evaluation on AgentDojo and AgentBench OS across multiple LLMs, PFI demonstrates significant improvements in Secure Utility Rate (SUR) and a zero Attacked Task Rate (ATR) compared to baselines, IsolateGPT, and f-secure, albeit with notable computational overhead. The work contributes a concrete, open-source architecture and a policy-driven approach to balance security and utility in real-world LLM agent deployments.

Abstract

Large Language Models (LLMs) are combined with tools to create powerful LLM agents that provide a wide range of services. Unlike traditional software, LLM agent's behavior is determined at runtime by natural language prompts from either user or tool's data. This flexibility enables a new computing paradigm with unlimited capabilities and programmability, but also introduces new security risks, vulnerable to privilege escalation attacks. Moreover, user prompts are prone to be interpreted in an insecure way by LLM agents, creating non-deterministic behaviors that can be exploited by attackers. To address these security risks, we propose Prompt Flow Integrity (PFI), a system security-oriented solution to prevent privilege escalation in LLM agents. Analyzing the architectural characteristics of LLM agents, PFI features three mitigation techniques -- i.e., agent isolation, secure untrusted data processing, and privilege escalation guardrails. Our evaluation result shows that PFI effectively mitigates privilege escalation attacks while successfully preserving the utility of LLM agents.

Paper Structure

This paper contains 25 sections, 10 figures, 6 tables.

Figures (10)

  • Figure 1: LLM Agent
  • Figure 2: Attacks on LLM Agents
  • Figure 3: Overview of [0.5] PFI
  • Figure 4: PFI Agent Architecture. Green, red, and yellow blocks represent $\mathcal{D}_T$, $\mathcal{D}_U$, and [0.5] PFI modules, respectively.
  • Figure 5: Secure Untrusted Data Processing with data IDs.
  • ...and 5 more figures