Table of Contents
Fetching ...

AEGIS: No Tool Call Left Unchecked -- A Pre-Execution Firewall and Audit Layer for AI Agents

Aojie Yuan, Zhiyuan Su, Yue Zhao

Abstract

AI agents increasingly act through external tools: they query databases, execute shell commands, read and write files, and send network requests. Yet in most current agent stacks, model-generated tool calls are handed to the execution layer with no framework-agnostic control point in between. Post-execution observability can record these actions, but it cannot stop them before side effects occur. We present AEGIS, a pre-execution firewall and audit layer for AI agents. AEGIS interposes on the tool-execution path and applies a three-stage pipeline: (i) deep string extraction from tool arguments, (ii) content-first risk scanning, and (iii) composable policy validation. High-risk calls can be held for human approval, and all decisions are recorded in a tamper-evident audit trail based on Ed25519 signatures and SHA-256 hash chaining. In the current implementation, AEGIS supports 14 agent frameworks across Python, JavaScript, and Go with lightweight integration. On a curated suite of 48 attackinstances, AEGIS blocks all attacks in the suite before execution; on 500 benign tool calls, it yields a 1.2% false positive rate; and across 1,000 consecutive interceptions, it adds 8.3 ms median latency. The live demo will show end-to-end interception of benign, malicious, and human-escalated tool calls, allowing attendees to observe real-time blocking, approval workflows, and audit-trail generation. These results suggest that pre-execution mediation for AI agents can be practical, low-overhead, and directly deployable.

AEGIS: No Tool Call Left Unchecked -- A Pre-Execution Firewall and Audit Layer for AI Agents

Abstract

AI agents increasingly act through external tools: they query databases, execute shell commands, read and write files, and send network requests. Yet in most current agent stacks, model-generated tool calls are handed to the execution layer with no framework-agnostic control point in between. Post-execution observability can record these actions, but it cannot stop them before side effects occur. We present AEGIS, a pre-execution firewall and audit layer for AI agents. AEGIS interposes on the tool-execution path and applies a three-stage pipeline: (i) deep string extraction from tool arguments, (ii) content-first risk scanning, and (iii) composable policy validation. High-risk calls can be held for human approval, and all decisions are recorded in a tamper-evident audit trail based on Ed25519 signatures and SHA-256 hash chaining. In the current implementation, AEGIS supports 14 agent frameworks across Python, JavaScript, and Go with lightweight integration. On a curated suite of 48 attackinstances, AEGIS blocks all attacks in the suite before execution; on 500 benign tool calls, it yields a 1.2% false positive rate; and across 1,000 consecutive interceptions, it adds 8.3 ms median latency. The live demo will show end-to-end interception of benign, malicious, and human-escalated tool calls, allowing attendees to observe real-time blocking, approval workflows, and audit-trail generation. These results suggest that pre-execution mediation for AI agents can be practical, low-overhead, and directly deployable.
Paper Structure (26 sections, 15 figures, 2 tables)

This paper contains 26 sections, 15 figures, 2 tables.

Figures (15)

  • Figure 1: Aegis overview. The SDK layer instruments 14 agent frameworks to intercept tool_use calls. The Gateway runs a three-stage pipeline (extract, scan, policy) producing allow/block/pending decisions. Pending calls route to the Compliance Cockpit for human review. All traces are logged to a tamper-evident audit trail w/ Ed25519 signatures and SHA-256 hash.
  • Figure 2: Attack instances blocked per category. On the curated suite used in this paper, Aegis blocks all 48 attacks.
  • Figure 3: Illustrative comparison across 7 attack categories. AgentDojo and ToolEmu are evaluation-oriented systems, whereas Aegis performs runtime mediation.
  • Figure 4: Latency distribution over 1,000 tool calls. Median 8.3 ms, P95 14.7 ms, P99 23.1 ms---negligible ($<$1%) relative to LLM inference.
  • Figure 5: Live interception in the test agent UI. The user submits a SQL injection attack; Aegis blocks the call and the agent gracefully explains why the request was rejected.
  • ...and 10 more figures