Design Patterns for Securing LLM Agents against Prompt Injections
Luca Beurer-Kellner, Beat Buesser, Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, Václav Volhejn
TL;DR
This paper tackles the security challenge of prompt injections in LLM-powered agents. It presents six principled design patterns—Action-Selector, Plan-Then-Execute, LLM Map-Reduce, Dual LLM, Code-Then-Execute, and Context-Minimization—to constrain agents and isolate untrusted inputs. Through ten diverse case studies (OS, SQL, email/calendar, customer service, booking, product recommender, resume screening, medication, medical diagnosis, and software engineering), it demonstrates how these patterns can be applied to achieve meaningful resistance to prompt injections while maintaining useful functionality. The work provides concrete design guidance and best-practice recommendations for developers and decision-makers aiming to deploy secure, application-specific LLM agents in real-world environments.
Abstract
As AI agents powered by Large Language Models (LLMs) become increasingly versatile and capable of addressing a broad spectrum of tasks, ensuring their security has become a critical challenge. Among the most pressing threats are prompt injection attacks, which exploit the agent's resilience on natural language inputs -- an especially dangerous threat when agents are granted tool access or handle sensitive information. In this work, we propose a set of principled design patterns for building AI agents with provable resistance to prompt injection. We systematically analyze these patterns, discuss their trade-offs in terms of utility and security, and illustrate their real-world applicability through a series of case studies.
