Table of Contents
Fetching ...

Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa, Kai Greshake, Hanshen Xiao, Chaowei Xiao, G. Edward Suh

Abstract

AI agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions embedded in untrusted data can trigger dangerous agent actions. This position paper discusses our vision for system-level defenses against indirect prompt injection attacks. We articulate three positions: (1) dynamic replanning and security policy updates are often necessary for dynamic tasks and realistic environments; (2) certain context-dependent security decisions would still require LLMs (or other learned models), but should only be made within system designs that strictly constrain what the model can observe and decide; (3) in inherently ambiguous cases, personalization and human interaction should be treated as core design considerations. In addition to our main positions, we discuss limitations of existing benchmarks that can create a false sense of utility and security. We also highlight the value of system-level defenses, which serve as the skeleton of agentic systems by structuring and controlling agent behaviors, integrating rule-based and model-based security checks, and enabling more targeted research on model robustness and human interaction.

Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

Abstract

AI agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions embedded in untrusted data can trigger dangerous agent actions. This position paper discusses our vision for system-level defenses against indirect prompt injection attacks. We articulate three positions: (1) dynamic replanning and security policy updates are often necessary for dynamic tasks and realistic environments; (2) certain context-dependent security decisions would still require LLMs (or other learned models), but should only be made within system designs that strictly constrain what the model can observe and decide; (3) in inherently ambiguous cases, personalization and human interaction should be treated as core design considerations. In addition to our main positions, we discuss limitations of existing benchmarks that can create a false sense of utility and security. We also highlight the value of system-level defenses, which serve as the skeleton of agentic systems by structuring and controlling agent behaviors, integrating rule-based and model-based security checks, and enabling more targeted research on model robustness and human interaction.

Paper Structure

This paper contains 10 sections, 1 figure.

Figures (1)

  • Figure 1: High-level system architecture for building LLM agents with both high utility and strong security: (1) Given a task, the orchestrator generates a plan and policy. (2) The plan/policy approver oversees this generation process to ensure that the resulting plan and policy are reasonable. (3) The executor takes the plan and generates a concrete action. (4) The policy enforcer approves or blocks the action based on the policy. (5.i) If the action is approved, it is issued to the environment, which returns a response. (5.ii) If the action is rejected, the policy enforcer sends negative feedback to the executor. (6) The executor processes feedback from either the environment or the policy enforcer, which can further trigger the orchestrator to update the plan and policy for the next iteration. Notes:Blue shields indicate where security-critical decisions can occur and therefore require special security design. Human icons indicate checkpoints that may require explicit human-in-the-loop oversight, such as personalizing system behavior, or resolving ambiguous objective alignment.