Solver-Aided Verification of Policy Compliance in Tool-Augmented LLM Agents

Cailin Winston; Claris Winston; René Just

Solver-Aided Verification of Policy Compliance in Tool-Augmented LLM Agents

Cailin Winston, Claris Winston, René Just

Abstract

Tool-augmented Large Language Models (TaLLMs) extend LLMs with the ability to invoke external tools, enabling them to interact with real-world environments. However, a major limitation in deploying TaLLMs in sensitive applications such as customer service and business process automation is a lack of reliable compliance with domain-specific operational policies regarding tool-use and agent behavior. Current approaches merely steer LLMs to adhere to policies by including policy descriptions in the LLM context, but these provide no guarantees that policy violations will be prevented. In this paper, we introduce an SMT solver-aided framework to enforce tool-use policy compliance in TaLLM agents. Specifically, we use an LLM-assisted, human-guided approach to translate natural-language-specified tool-use policies into formal logic (SMT-LIB-2.0) constraints over agent-observable state and tool arguments. At runtime, planned tool calls are intercepted and checked against the constraints using the Z3 solver as a pre-condition to the tool call. Tool invocations that violate the policy are blocked. We evaluated on the TauBench benchmark and demonstrate that solver-aided policy checking reduces policy violations while maintaining overall task accuracy. These results suggest that integrating formal reasoning into TaLLM execution can improve tool-call policy compliance and overall reliability.

Solver-Aided Verification of Policy Compliance in Tool-Augmented LLM Agents

Abstract

Paper Structure (17 sections, 4 figures)

This paper contains 17 sections, 4 figures.

Introduction
Logical Encoding of Natural Language Policies
Policy Encoding
Tool Validation Schema
Solver-Aided Policy Compatibility Checking
Runtime Tool-Call Interception
Agent Planning Control
Evaluation Setup
Results
RQ1: Formal Logic Policy Translation
RQ2: Reduction in Policy-Violating Tool Calls
RQ3: Impact on Overall Task Accuracy and Consistency
Related Work
Tool Augmented LLMs
Policy Enforcement in LLM Agents
...and 2 more sections

Figures (4)

Figure 1: Tool-augmented LLM with tool-call policy checker. The framework integrates an SMT-based policy checker into the tool-call execution loop, blocking tool invocations that violate specified tool-use policies.
Figure 2: Flow diagram of the policy compatibility checker in order to validate whether a tool call for cancellation can be called.
Figure 3: Precision and recall of write tool calls for baseline TaLLM and the TaLLM with policy checker.
Figure 4: The pass$^{\wedge}$k scores for the baseline TaLLM and the TaLLM with policy checker.

Solver-Aided Verification of Policy Compliance in Tool-Augmented LLM Agents

Abstract

Solver-Aided Verification of Policy Compliance in Tool-Augmented LLM Agents

Authors

Abstract

Table of Contents

Figures (4)