Table of Contents
Fetching ...

Design Patterns for Securing LLM Agents against Prompt Injections

Luca Beurer-Kellner, Beat Buesser, Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, Václav Volhejn

TL;DR

This paper tackles the security challenge of prompt injections in LLM-powered agents. It presents six principled design patterns—Action-Selector, Plan-Then-Execute, LLM Map-Reduce, Dual LLM, Code-Then-Execute, and Context-Minimization—to constrain agents and isolate untrusted inputs. Through ten diverse case studies (OS, SQL, email/calendar, customer service, booking, product recommender, resume screening, medication, medical diagnosis, and software engineering), it demonstrates how these patterns can be applied to achieve meaningful resistance to prompt injections while maintaining useful functionality. The work provides concrete design guidance and best-practice recommendations for developers and decision-makers aiming to deploy secure, application-specific LLM agents in real-world environments.

Abstract

As AI agents powered by Large Language Models (LLMs) become increasingly versatile and capable of addressing a broad spectrum of tasks, ensuring their security has become a critical challenge. Among the most pressing threats are prompt injection attacks, which exploit the agent's resilience on natural language inputs -- an especially dangerous threat when agents are granted tool access or handle sensitive information. In this work, we propose a set of principled design patterns for building AI agents with provable resistance to prompt injection. We systematically analyze these patterns, discuss their trade-offs in terms of utility and security, and illustrate their real-world applicability through a series of case studies.

Design Patterns for Securing LLM Agents against Prompt Injections

TL;DR

This paper tackles the security challenge of prompt injections in LLM-powered agents. It presents six principled design patterns—Action-Selector, Plan-Then-Execute, LLM Map-Reduce, Dual LLM, Code-Then-Execute, and Context-Minimization—to constrain agents and isolate untrusted inputs. Through ten diverse case studies (OS, SQL, email/calendar, customer service, booking, product recommender, resume screening, medication, medical diagnosis, and software engineering), it demonstrates how these patterns can be applied to achieve meaningful resistance to prompt injections while maintaining useful functionality. The work provides concrete design guidance and best-practice recommendations for developers and decision-makers aiming to deploy secure, application-specific LLM agents in real-world environments.

Abstract

As AI agents powered by Large Language Models (LLMs) become increasingly versatile and capable of addressing a broad spectrum of tasks, ensuring their security has become a critical challenge. Among the most pressing threats are prompt injection attacks, which exploit the agent's resilience on natural language inputs -- an especially dangerous threat when agents are granted tool access or handle sensitive information. In this work, we propose a set of principled design patterns for building AI agents with provable resistance to prompt injection. We systematically analyze these patterns, discuss their trade-offs in terms of utility and security, and illustrate their real-world applicability through a series of case studies.

Paper Structure

This paper contains 88 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: The action-selector pattern. The red color represents untrusted data. The LLM acts as a translator between a natural language prompt, and a series of pre-defined actions to be executed over untrusted data.
  • Figure 2: The plan-then-execute pattern. Before processing any untrusted data, the LLM defines a plan consisting of a series of allowed tool calls. A prompt injection cannot force the LLM into executing a tool that is not part of the defined plan.
  • Figure 3: The LLM map-reduce pattern. Untrusted documents are processed independently, to ensure that a malicious document cannot impact the processing of another document.
  • Figure 4: The dual LLM pattern. A privileged LLM has access to tools but never processes untrusted data. This LLM can call a quarantined LLM to process untrusted data, but without any tool access. Results from processing untrusted data are stored in a memory that the privileged LLM can manipulate by reference only.
  • Figure 5: The code-then-execute pattern. The LLM writes a piece of code that can call tools and make calls to other LLMs. The code is then run on untrusted data.
  • ...and 2 more figures