Table of Contents
Fetching ...

Optimizing Agent Planning for Security and Autonomy

Aashish Kolluri, Rishi Sharma, Manuel Costa, Boris Köpf, Tobias Nießen, Mark Russinovich, Shruti Tople, Santiago Zanella-Béguelin

TL;DR

This paper addresses indirect prompt injection (PIA) attacks on AI agents by advocating deterministic, IFC-based defenses that guarantee policy-compliant actions but can hinder task performance. It introduces autonomy metrics, HITL load and TCR@k, to quantify the reduction in human oversight enabled by IFC and policy-aware planning. The authors present Prudentia, an IFC-aware planner built atop Fides, incorporating policy-awareness, strategic variable expansion, and endorsement- versus approval-based data handling to maximize autonomy while preserving security and task utility. Evaluations on AgentDojo and WASP demonstrate that Prudentia achieves higher autonomy than prior IFC-based defenses with comparable or improved task completion, including zero HITL for data-independent tasks, underscoring the practical impact of policy-aware planning for secure, autonomous agents.

Abstract

Indirect prompt injection attacks threaten AI agents that execute consequential actions, motivating deterministic system-level defenses. Such defenses can provably block unsafe actions by enforcing confidentiality and integrity policies, but currently appear costly: they reduce task completion rates and increase token usage compared to probabilistic defenses. We argue that existing evaluations miss a key benefit of system-level defenses: reduced reliance on human oversight. We introduce autonomy metrics to quantify this benefit: the fraction of consequential actions an agent can execute without human-in-the-loop (HITL) approval while preserving security. To increase autonomy, we design a security-aware agent that (i) introduces richer HITL interactions, and (ii) explicitly plans for both task progress and policy compliance. We implement this agent design atop an existing information-flow control defense against prompt injection and evaluate it on the AgentDojo and WASP benchmarks. Experiments show that this approach yields higher autonomy without sacrificing utility.

Optimizing Agent Planning for Security and Autonomy

TL;DR

This paper addresses indirect prompt injection (PIA) attacks on AI agents by advocating deterministic, IFC-based defenses that guarantee policy-compliant actions but can hinder task performance. It introduces autonomy metrics, HITL load and TCR@k, to quantify the reduction in human oversight enabled by IFC and policy-aware planning. The authors present Prudentia, an IFC-aware planner built atop Fides, incorporating policy-awareness, strategic variable expansion, and endorsement- versus approval-based data handling to maximize autonomy while preserving security and task utility. Evaluations on AgentDojo and WASP demonstrate that Prudentia achieves higher autonomy than prior IFC-based defenses with comparable or improved task completion, including zero HITL for data-independent tasks, underscoring the practical impact of policy-aware planning for secure, autonomous agents.

Abstract

Indirect prompt injection attacks threaten AI agents that execute consequential actions, motivating deterministic system-level defenses. Such defenses can provably block unsafe actions by enforcing confidentiality and integrity policies, but currently appear costly: they reduce task completion rates and increase token usage compared to probabilistic defenses. We argue that existing evaluations miss a key benefit of system-level defenses: reduced reliance on human oversight. We introduce autonomy metrics to quantify this benefit: the fraction of consequential actions an agent can execute without human-in-the-loop (HITL) approval while preserving security. To increase autonomy, we design a security-aware agent that (i) introduces richer HITL interactions, and (ii) explicitly plans for both task progress and policy compliance. We implement this agent design atop an existing information-flow control defense against prompt injection and evaluate it on the AgentDojo and WASP benchmarks. Experiments show that this approach yields higher autonomy without sacrificing utility.
Paper Structure (38 sections, 3 equations, 6 figures, 7 tables)

This paper contains 38 sections, 3 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Performance comparison across key metrics for o3-mini and o4-mini models. Left: Task Completion Rate (higher is better). Center: HITL load (lower indicates better autonomy). Right: $\mathsf{TCR}\text{@}0$ (higher indicates improved full autonomy).
  • Figure 2: $\mathsf{TCR}\text{@}k$ curves showing task completion as a function of HITL load Higher curves indicate better autonomy-utility trade-offs. Prudentia consistently outperforms baselines, achieving higher autonomy with fewer human interventions.
  • Figure 3: TCR@$k$ for $k \in \{\, 0, 1, 2, \infty \,\}$ and total HITL load across all successful tasks. Tasks are categorized as suggested by fides2025, i.e., DD refers to data-dependent tasks.
  • Figure 4: Task Completion Rates with unlimited HITL load for each implementation across different models.
  • Figure 5: Total HITL interaction count across all successfully completed tasks for each implementation across different models.
  • ...and 1 more figures