Table of Contents
Fetching ...

Governing Dynamic Capabilities: Cryptographic Binding and Reproducibility Verification for AI Agent Tool Use

Ziling Zhou

Abstract

AI agents dynamically acquire capabilities at runtime via MCP and A2A, yet no framework detects when capabilities change post-authorization. We term this the capability-identity gap}: it enables silent capability escalation and violates EU AI Act traceability requirements. We propose three mechanisms. Capability-bound agent certificates extend X.509 v3 with a skills manifest hash; any tool change invalidates the certificate. Reproducibility commitments leverage LLM inference near-determinism for post-hoc replay verification. A verifiable interaction ledger provides hash-linked, signed records for multi-agent forensic reconstruction. We formalize nine security properties and prove they hold under a realistic adversary model. Our Rust prototype achieves 97us certificate verification (<1ns capability binding overhead, ~1,200,000 faster than BAID's zkVM), 0.62ms total governance overhead per tool call (0.1--1.2% of typical latency), and 4.7X separation from cross-provider outputs (Cohen's d > 1.0 on all four metrics), with best classification at F_1=0.876 (Jaccard, θ=0.408); single-provider deployments achieve F_1=0.990 with 11.5 times separation. We evaluate 12 attack scenarios -- silent escalation, tool trojanization, phantom delegation, evidence tampering, collusion, and runtime behavioral attacks validated against NVIDIA's Nemotron-AIQ traces -- each detected with a traceable mechanism, while the MCP+OAuth 2.1 baseline detects none. An end-to-end evaluation over a 5-to-20-agent pipeline with real LLM calls confirms that full governance (G1--G3) adds ~10.8ms per pipeline run (0.12% overhead), scales sub-linearly per agent, and detects all five in-situ attacks with zero false positives.

Governing Dynamic Capabilities: Cryptographic Binding and Reproducibility Verification for AI Agent Tool Use

Abstract

AI agents dynamically acquire capabilities at runtime via MCP and A2A, yet no framework detects when capabilities change post-authorization. We term this the capability-identity gap}: it enables silent capability escalation and violates EU AI Act traceability requirements. We propose three mechanisms. Capability-bound agent certificates extend X.509 v3 with a skills manifest hash; any tool change invalidates the certificate. Reproducibility commitments leverage LLM inference near-determinism for post-hoc replay verification. A verifiable interaction ledger provides hash-linked, signed records for multi-agent forensic reconstruction. We formalize nine security properties and prove they hold under a realistic adversary model. Our Rust prototype achieves 97us certificate verification (<1ns capability binding overhead, ~1,200,000 faster than BAID's zkVM), 0.62ms total governance overhead per tool call (0.1--1.2% of typical latency), and 4.7X separation from cross-provider outputs (Cohen's d > 1.0 on all four metrics), with best classification at F_1=0.876 (Jaccard, θ=0.408); single-provider deployments achieve F_1=0.990 with 11.5 times separation. We evaluate 12 attack scenarios -- silent escalation, tool trojanization, phantom delegation, evidence tampering, collusion, and runtime behavioral attacks validated against NVIDIA's Nemotron-AIQ traces -- each detected with a traceable mechanism, while the MCP+OAuth 2.1 baseline detects none. An end-to-end evaluation over a 5-to-20-agent pipeline with real LLM calls confirms that full governance (G1--G3) adds ~10.8ms per pipeline run (0.12% overhead), scales sub-linearly per agent, and detects all five in-situ attacks with zero false positives.
Paper Structure (45 sections, 12 theorems, 6 equations, 5 figures, 12 tables)

This paper contains 45 sections, 12 theorems, 6 equations, 5 figures, 12 tables.

Key Result

Theorem 1

If compromised agent $v$ executes model $M' \neq M$ or skills $S' \neq S$, replay verification detects the deviation with probability $\geq 1 - \epsilon(\lambda)$ for negligible $\epsilon$.

Figures (5)

  • Figure 1: The four-layer agent security landscape and the capability-identity gap. Existing layers partially address individual requirements but none bridges them. Our framework provides the missing cryptographic governance layer satisfying G1--G3, validated end-to-end across 5--20 agent pipelines.
  • Figure 2: Trust propagation tree for the evaluation environment. NA nodes (rectangles) form the trust anchor; AG nodes (rounded) are agents. Constraints decay monotonically downward.
  • Figure 3: Verification flow for an MCP tool call through the A2Auth governance layer. Each phase can independently reject with a specific reason.
  • Figure 4: Chain Verifiability: a single G2$=$none interior agent at position $k$ breaks behavioral verification for all downstream nodes, even if they individually satisfy G2$=$full. The verifiable region extends only to CVD$(v_n) = k$.
  • Figure 5: End-to-end multi-agent pipeline architecture with governance overlay. Five specialist agents coordinate through a trust tree (depth 3). Red labels indicate E2E attack injection points (Table \ref{['tab:e2e_attacks']}). All inter-agent communications produce bilaterally-signed ledger records.

Theorems & Definitions (30)

  • Definition 1: Agent Governance Requirements
  • Definition 2: Capability-Bound Certificate
  • Definition 3: Skills Manifest Hash
  • Definition 4: Trust Constraint Ordering
  • Definition 5: Reproducibility Commitment
  • Theorem 1: Reproducibility Soundness, informal
  • Definition 6: $(n, \epsilon)$-Indistinguishability
  • Theorem 2: Bounded Divergence
  • Definition 7: Chain Verifiability Depth
  • Theorem 3: Chain Verifiability
  • ...and 20 more