Table of Contents
Fetching ...

Agent-Sentry: Bounding LLM Agents via Execution Provenance

Rohan Sequeira, Stavros Damianakis, Umar Iqbal, Konstantinos Psounis

Abstract

Agentic computing systems, which autonomously spawn new functionalities based on natural language instructions, are becoming increasingly prevalent. While immensely capable, these systems raise serious security, privacy, and safety concerns. Fundamentally, the full set of functionalities offered by these systems, combined with their probabilistic execution flows, is not known beforehand. Given this lack of characterization, it is non-trivial to validate whether a system has successfully carried out the user's intended task or instead executed irrelevant actions, potentially as a consequence of compromise. In this paper, we propose Agent-Sentry, a framework that attempts to bound agentic systems to address this problem. Our key insight is that agentic systems are designed for specific use cases and therefore need not expose unbounded or unspecified functionalities. Once bounded, these systems become easier to scrutinize. Agent-Sentry operationalizes this insight by uncovering frequent functionalities offered by an agentic system, along with their execution traces, to construct behavioral bounds. It then learns a policy from these traces and blocks tool calls that deviate from learned behaviors or that misalign with user intent. Our evaluation shows that Agent-Sentry helps prevent over 90\% of attacks that attempt to trigger out-of-bounds executions, while preserving up to 98\% of system utility.

Agent-Sentry: Bounding LLM Agents via Execution Provenance

Abstract

Agentic computing systems, which autonomously spawn new functionalities based on natural language instructions, are becoming increasingly prevalent. While immensely capable, these systems raise serious security, privacy, and safety concerns. Fundamentally, the full set of functionalities offered by these systems, combined with their probabilistic execution flows, is not known beforehand. Given this lack of characterization, it is non-trivial to validate whether a system has successfully carried out the user's intended task or instead executed irrelevant actions, potentially as a consequence of compromise. In this paper, we propose Agent-Sentry, a framework that attempts to bound agentic systems to address this problem. Our key insight is that agentic systems are designed for specific use cases and therefore need not expose unbounded or unspecified functionalities. Once bounded, these systems become easier to scrutinize. Agent-Sentry operationalizes this insight by uncovering frequent functionalities offered by an agentic system, along with their execution traces, to construct behavioral bounds. It then learns a policy from these traces and blocks tool calls that deviate from learned behaviors or that misalign with user intent. Our evaluation shows that Agent-Sentry helps prevent over 90\% of attacks that attempt to trigger out-of-bounds executions, while preserving up to 98\% of system utility.
Paper Structure (76 sections, 6 figures, 13 tables)

This paper contains 76 sections, 6 figures, 13 tables.

Figures (6)

  • Figure 1: Agent-Sentry architecture in action: (1) A user submits a request that requires conditional tool execution, such as reading a file and transferring funds. (2) The agent's LLM proposes a tool call, which is intercepted by Agent-Sentry. Only action tool calls are evaluated against the Functionality Graph. (3) The Intent Alignment Mechanism verifies if the action is in alignment with the user prompt, tool call history, and current tool call. It never sees any untrusted retrieved content. (3) For ambiguous or unseen flows, the intent alignment mechanism verifies whether the proposed action is consistent with the original user request using only trusted inputs. (4)-(5) If the checks succeed, the tool is executed and its result is returned to the agent for continued execution. (6) If the execution flow was benign and the agent has completed all tasks, then the agent responds to the user with the successful output. (7)-(8) If either the functionality graph analysis or intent alignment detects anomalous provenance or intent deviation, the action is blocked at the execution layer.
  • Figure 2: Agent-Sentry Utility success rate of Agent-Sentry as a function of functionality graph coverage on the Agent-Sentry Bench dataset.
  • Figure 3: Attack success rate of Agent-Sentry as a function of functionality graph coverage on the Agent-Sentry Bench dataset.
  • Figure 4: Functionality graphs excerpt from Banking agent showing the Benign/Utility functionality graphs.
  • Figure 5: Functionality graphs excerpt from Banking agent showing the Ambiguous functionality graphs.
  • ...and 1 more figures