Table of Contents
Fetching ...

Securing AI Agents with Information-Flow Control

Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin

TL;DR

This work tackles the security challenges facing autonomous AI agents, notably indirect prompt injections, by employing information-flow control (IFC) to attach confidentiality and integrity labels to all data and tool interactions. It formalizes agent loops, develops a dynamic taint-tracking framework, and proves semantic guarantees such as non-interference for integrity and explicit secrecy for confidentiality. The authors introduce Fides, a refined IFC-powered planner that selectively hides data and uses constrained inspection to balance security with task utility, and demonstrate its effectiveness on the AgentDojo benchmark with policy checks that block most PIAs while preserving competitive task completion. Overall, the paper provides a formal model, a taxonomy of tasks, and empirical evidence that deterministic IFC can expand the set of securely solvable tasks for AI agents without prohibitive utility loss. These contributions advance practical, provable security for agent architectures in real-world, multi-tool settings.

Abstract

As AI agents become increasingly autonomous and capable, ensuring their security against vulnerabilities such as prompt injection becomes critical. This paper explores the use of information-flow control (IFC) to provide security guarantees for AI agents. We present a formal model to reason about the security and expressiveness of agent planners. Using this model, we characterize the class of properties enforceable by dynamic taint-tracking and construct a taxonomy of tasks to evaluate security and utility trade-offs of planner designs. Informed by this exploration, we present Fides, a planner that tracks confidentiality and integrity labels, deterministically enforces security policies, and introduces novel primitives for selectively hiding information. Its evaluation in AgentDojo demonstrates that this approach enables us to complete a broad range of tasks with security guarantees. A tutorial to walk readers through the the concepts introduced in the paper can be found at https://github.com/microsoft/fides

Securing AI Agents with Information-Flow Control

TL;DR

This work tackles the security challenges facing autonomous AI agents, notably indirect prompt injections, by employing information-flow control (IFC) to attach confidentiality and integrity labels to all data and tool interactions. It formalizes agent loops, develops a dynamic taint-tracking framework, and proves semantic guarantees such as non-interference for integrity and explicit secrecy for confidentiality. The authors introduce Fides, a refined IFC-powered planner that selectively hides data and uses constrained inspection to balance security with task utility, and demonstrate its effectiveness on the AgentDojo benchmark with policy checks that block most PIAs while preserving competitive task completion. Overall, the paper provides a formal model, a taxonomy of tasks, and empirical evidence that deterministic IFC can expand the set of securely solvable tasks for AI agents without prohibitive utility loss. These contributions advance practical, provable security for agent architectures in real-world, multi-tool settings.

Abstract

As AI agents become increasingly autonomous and capable, ensuring their security against vulnerabilities such as prompt injection becomes critical. This paper explores the use of information-flow control (IFC) to provide security guarantees for AI agents. We present a formal model to reason about the security and expressiveness of agent planners. Using this model, we characterize the class of properties enforceable by dynamic taint-tracking and construct a taxonomy of tasks to evaluate security and utility trade-offs of planner designs. Informed by this exploration, we present Fides, a planner that tracks confidentiality and integrity labels, deterministically enforces security policies, and introduces novel primitives for selectively hiding information. Its evaluation in AgentDojo demonstrates that this approach enables us to complete a broad range of tasks with security guarantees. A tutorial to walk readers through the the concepts introduced in the paper can be found at https://github.com/microsoft/fides

Paper Structure

This paper contains 74 sections, 2 theorems, 13 equations, 7 figures, 8 tables, 7 algorithms.

Key Result

Proposition 1

Algorithm alg:loop_taint with policies P-T and P-F correctly applied to every tool, guarantees non-interference for the integrity of tool calls and data, and explicit secrecy for the confidentiality of data.

Figures (7)

  • Figure 1: Overview of Fides. The agent loop receives a task from the user and orchestrates the interaction between the planner, the LLM, tools, and policy engine. Fides propagates labels in messages, actions, tool calls and results; it executes consequential actions proposed by the planner only if they satisfy a security policy, expressed in terms of these labels.
  • Figure 2: Product of the standard confidentiality and integrity lattices, with arrows indicating the direction of allowed flows.
  • Figure 3: Overall task completion rates for planners across all AgentDojo tasks when not under attack and no policy checks.
  • Figure 4: Utility comparison with and without policy checks for Basic and Fides. Solid bars show performance with policy; hollow dotted outlines show performance without policy (solid appears to fill the hollow bar).
  • Figure 5: Utility for Fides based on the reasoning models across different task categories with policy checks. DI represents data independent, DIQ represents data independent with query_llm, and DD represents data dependent.
  • ...and 2 more figures

Theorems & Definitions (8)

  • Proposition 1
  • Lemma 1
  • proof
  • Definition 1: Explicit knowledge
  • Definition 2: Explicit secrecy
  • Definition 3: Task
  • Definition 4: Data independence
  • Definition 5: Realizability