Table of Contents
Fetching ...

Agentproof: Static Verification of Agent Workflow Graphs

Melwin Xavier, Vaisakh M A, Melveena Jolly, Midhun Xavier

Abstract

Agent frameworks increasingly encode tool-using behavior as explicit workflow graphs, yet safety enforcement remains a runtime concern. These frameworks expose analyzable graph structure through their APIs, enabling pre-deployment static verification of safety properties that runtime guardrails can only check reactively. This paper presents Agentproof, a system that automatically extracts a unified abstract graph model from four major agent frameworks (LangGraph, CrewAI, AutoGen, Google ADK), applies six structural checks with witness trace generation, and evaluates temporal safety policies via a DSL compiled to deterministic finite automata, both statically through a graph x DFA product construction and at runtime over event traces. Unlike general-purpose model checkers, Agentproof requires no manual modeling. In a curated benchmark of 18 author-constructed workflows, 27% of the benchmark contain structural defects (dead-end nodes, unreachable exits) and 55% violate a human-gate policy when enforced, distinct categories that prior work conflates. All 15 temporal policies defined fit within the seven-form DSL fragment, and verification completes in sub-second time for graphs up to 5,000 nodes. The corpus serves as a reproducible benchmark for evaluating static verification tools rather than as a prevalence study; defect rates reflect tool detection capability on a targeted benchmark, not base rates in production systems. Nonetheless, static graph verification complements runtime guardrails by catching topology-level defects that runtime tools miss unless the offending path is exercised.

Agentproof: Static Verification of Agent Workflow Graphs

Abstract

Agent frameworks increasingly encode tool-using behavior as explicit workflow graphs, yet safety enforcement remains a runtime concern. These frameworks expose analyzable graph structure through their APIs, enabling pre-deployment static verification of safety properties that runtime guardrails can only check reactively. This paper presents Agentproof, a system that automatically extracts a unified abstract graph model from four major agent frameworks (LangGraph, CrewAI, AutoGen, Google ADK), applies six structural checks with witness trace generation, and evaluates temporal safety policies via a DSL compiled to deterministic finite automata, both statically through a graph x DFA product construction and at runtime over event traces. Unlike general-purpose model checkers, Agentproof requires no manual modeling. In a curated benchmark of 18 author-constructed workflows, 27% of the benchmark contain structural defects (dead-end nodes, unreachable exits) and 55% violate a human-gate policy when enforced, distinct categories that prior work conflates. All 15 temporal policies defined fit within the seven-form DSL fragment, and verification completes in sub-second time for graphs up to 5,000 nodes. The corpus serves as a reproducible benchmark for evaluating static verification tools rather than as a prevalence study; defect rates reflect tool detection capability on a targeted benchmark, not base rates in production systems. Nonetheless, static graph verification complements runtime guardrails by catching topology-level defects that runtime tools miss unless the offending path is exercised.
Paper Structure (78 sections, 9 theorems, 7 figures, 10 tables)

This paper contains 78 sections, 9 theorems, 7 figures, 10 tables.

Key Result

Lemma 1

If $\mathrm{ExitReach}(G)$ holds (i.e., the BFS/DFS from $v_0$ visits all of $V_f$), then for every exit node $v_f \in V_f$, there exists a directed path from $v_0$ to $v_f$ in $G$.

Figures (7)

  • Figure 1: Deployment model for T2: Agentproof as a CI/CD verification gate. The workflow definition is untrusted; the extractor, verifier, and pipeline are within the trusted computing base.
  • Figure 2: Agentproof pipeline: extract a framework workflow into an abstract graph model, run structural verification, and compile temporal policies into monitors for trace evaluation.
  • Figure 3: Example extracted workflow graph with typed nodes (ENTRY/ROUTER/LLM/TOOL/HUMAN/EXIT).
  • Figure 4: Illustrative DFA for an $a \to \mathbf{F}\, b$ interleaving constraint: after $a$ occurs, $b$ must occur before $a$ repeats.
  • Figure 5: Defect distribution across frameworks, grouped by defect type. The dashed line separates structural defects (left) from policy violations (right).
  • ...and 2 more figures

Theorems & Definitions (22)

  • Definition 1: Agent Workflow Graph
  • Definition 2: Execution trace
  • Definition 3: Structural predicates
  • Definition 4: Extraction correctness
  • Lemma 1: Exit Reachability Soundness
  • proof
  • Lemma 2: Dead-End Soundness
  • proof
  • Lemma 3: Router Shape Soundness
  • proof
  • ...and 12 more