Table of Contents
Fetching ...

Towards Enforcing Company Policy Adherence in Agentic Workflows

Naama Zwerdling, David Boaz, Ella Rabinovich, Guy Uziel, David Amid, Ateret Anaby-Tavor

TL;DR

The paper tackles the problem of reliably enforcing company policies in agentic workflows powered by large language models. It introduces a deterministic, two-phase framework consisting of an offline buildtime policy-to-tool mapping (Tool-Policy Mapper) and a runtime guard-generation component (ToolGuards) that execute before each tool invocation. Through evaluation in the $\tau$-bench Airlines domain, the approach yields encouraging improvements in end-to-end policy adherence, demonstrating substantial gains over baseline, best-effort methods and offering a scalable path toward enterprise-grade deployment. The work provides a detailed lifecycle, evaluation methods, and practical guidance for integrating policy-aware enforcement into existing agentic frameworks, highlighting both potential benefits and real-world challenges that remain to be addressed.

Abstract

Large Language Model (LLM) agents hold promise for a flexible and scalable alternative to traditional business process automation, but struggle to reliably follow complex company policies. In this study we introduce a deterministic, transparent, and modular framework for enforcing business policy adherence in agentic workflows. Our method operates in two phases: (1) an offline buildtime stage that compiles policy documents into verifiable guard code associated with tool use, and (2) a runtime integration where these guards ensure compliance before each agent action. We demonstrate our approach on the challenging $τ$-bench Airlines domain, showing encouraging preliminary results in policy enforcement, and further outline key challenges for real-world deployments.

Towards Enforcing Company Policy Adherence in Agentic Workflows

TL;DR

The paper tackles the problem of reliably enforcing company policies in agentic workflows powered by large language models. It introduces a deterministic, two-phase framework consisting of an offline buildtime policy-to-tool mapping (Tool-Policy Mapper) and a runtime guard-generation component (ToolGuards) that execute before each tool invocation. Through evaluation in the -bench Airlines domain, the approach yields encouraging improvements in end-to-end policy adherence, demonstrating substantial gains over baseline, best-effort methods and offering a scalable path toward enterprise-grade deployment. The work provides a detailed lifecycle, evaluation methods, and practical guidance for integrating policy-aware enforcement into existing agentic frameworks, highlighting both potential benefits and real-world challenges that remain to be addressed.

Abstract

Large Language Model (LLM) agents hold promise for a flexible and scalable alternative to traditional business process automation, but struggle to reliably follow complex company policies. In this study we introduce a deterministic, transparent, and modular framework for enforcing business policy adherence in agentic workflows. Our method operates in two phases: (1) an offline buildtime stage that compiles policy documents into verifiable guard code associated with tool use, and (2) a runtime integration where these guards ensure compliance before each agent action. We demonstrate our approach on the challenging -bench Airlines domain, showing encouraging preliminary results in policy enforcement, and further outline key challenges for real-world deployments.

Paper Structure

This paper contains 34 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: During the offline buildtime step (left), policy documents, along with system schema and tools specification are compiled into compact, easily interpretable mapping of concrete policies onto tools (editable json in this case); which in turn is used to generate ToolGuards -- code that verifies that policies hold given a system snapshot. ToolGuards are integrated into agentic runtime at the tool invocation point (right). We mark two possible points of domain-expert intervention: reviewing textual Tool-Policy mapper outcome, and the generated ToolGuards code.
  • Figure 2: $\tau$-bench Airlines benchmark evaluation. Deploying ToolGuards results in steady improvement of over 20 percent points compared to the original run.
  • Figure 3: LangGraph nodes in the Tool-Policy Mapper.
  • Figure 4: Interface view illustrating how policy document sections are assigned to specific tools.
  • Figure 5: Example of a prompt used to generate compliance and violation examples for a target tool.
  • ...and 1 more figures