BEAVER: An Efficient Deterministic LLM Verifier
Tarun Suresh, Nalin Wadhwa, Debangshu Banerjee, Gagandeep Singh
TL;DR
BEAVER tackles the lack of deterministic guarantees for LLM constraint satisfaction by introducing a frontier-driven, prefix-closed verification framework. It leverages a token trie and a frontier to incrementally compute sound lower and upper bounds on constraint-satisfaction probability, with a branch-and-bound style algorithm that tightens bounds anytime during execution. The approach demonstrates substantial tightening over rejection sampling across GSM-Symbolic, Enron leakage, and CyberSecEval tasks, and provides actionable risk assessment by exposing high-risk instances with tight bounds. The work also analyzes runtime and decoding-parameter effects, discusses limitations, and releases open-source tooling for practical deployment in safety-critical settings.
Abstract
As large language models (LLMs) transition from research prototypes to production systems, practitioners often need reliable methods to verify that model outputs satisfy required constraints. While sampling-based estimates provide an intuition of model behavior, they offer no sound guarantees. We present BEAVER, the first practical framework for computing deterministic, sound probability bounds on LLM constraint satisfaction. Given any prefix-closed semantic constraint, BEAVER systematically explores the generation space using novel token trie and frontier data structures, maintaining provably sound bounds at every iteration. We formalize the verification problem, prove soundness of our approach, and evaluate BEAVER on correctness verification, privacy verification and secure code generation tasks across multiple state of the art LLMs. BEAVER achieves 6 to 8 times tighter probability bounds and identifies 3 to 4 times more high risk instances compared to baseline methods under identical computational budgets, enabling precise characterization and risk assessment that loose bounds or empirical evaluation cannot provide.
