Table of Contents
Fetching ...

Verifying Memoryless Sequential Decision-making of Large Language Models

Dennis Gross, Helge Spieker, Arnaud Gotlieb

TL;DR

The paper addresses verifying memoryless LLM-based policies in sequential decision-making tasks modeled as MDPs against $PCTL$ safety specifications. It presents a framework that incrementally builds the reachable MDP guided by the LLM's actions, encodes each state as a natural-language prompt, and uses an LLM output parsed into actions, with Storm performing exact verification. The key contributions include integrating Ollama with PRISM/Storm for automated, reachable-state verification and demonstrating feasibility on standard grid-world benchmarks, while highlighting current performance gaps relative to deep RL baselines. This work provides a practical foundation for formal benchmarking and safety assurance as LLM capabilities evolve, enabling continuous verification of increasingly capable decision-making policies.

Abstract

We introduce a tool for rigorous and automated verification of large language model (LLM)- based policies in memoryless sequential decision-making tasks. Given a Markov decision process (MDP) representing the sequential decision-making task, an LLM policy, and a safety requirement expressed as a PCTL formula, our approach incrementally constructs only the reachable portion of the MDP guided by the LLM's chosen actions. Each state is encoded as a natural language prompt, the LLM's response is parsed into an action, and reachable successor states by the policy are expanded. The resulting formal model is checked with Storm to determine whether the policy satisfies the specified safety property. In experiments on standard grid world benchmarks, we show that open source LLMs accessed via Ollama can be verified when deterministically seeded, but generally underperform deep reinforcement learning baselines. Our tool natively integrates with Ollama and supports PRISM-specified tasks, enabling continuous benchmarking in user-specified sequential decision-making tasks and laying a practical foundation for formally verifying increasingly capable LLMs.

Verifying Memoryless Sequential Decision-making of Large Language Models

TL;DR

The paper addresses verifying memoryless LLM-based policies in sequential decision-making tasks modeled as MDPs against safety specifications. It presents a framework that incrementally builds the reachable MDP guided by the LLM's actions, encodes each state as a natural-language prompt, and uses an LLM output parsed into actions, with Storm performing exact verification. The key contributions include integrating Ollama with PRISM/Storm for automated, reachable-state verification and demonstrating feasibility on standard grid-world benchmarks, while highlighting current performance gaps relative to deep RL baselines. This work provides a practical foundation for formal benchmarking and safety assurance as LLM capabilities evolve, enabling continuous verification of increasingly capable decision-making policies.

Abstract

We introduce a tool for rigorous and automated verification of large language model (LLM)- based policies in memoryless sequential decision-making tasks. Given a Markov decision process (MDP) representing the sequential decision-making task, an LLM policy, and a safety requirement expressed as a PCTL formula, our approach incrementally constructs only the reachable portion of the MDP guided by the LLM's chosen actions. Each state is encoded as a natural language prompt, the LLM's response is parsed into an action, and reachable successor states by the policy are expanded. The resulting formal model is checked with Storm to determine whether the policy satisfies the specified safety property. In experiments on standard grid world benchmarks, we show that open source LLMs accessed via Ollama can be verified when deterministically seeded, but generally underperform deep reinforcement learning baselines. Our tool natively integrates with Ollama and supports PRISM-specified tasks, enabling continuous benchmarking in user-specified sequential decision-making tasks and laying a practical foundation for formally verifying increasingly capable LLMs.

Paper Structure

This paper contains 25 sections, 5 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Autoregressive text generation process via an transformer. The transformer processes the input text, producing a probability distribution over the potential next tokens using its neural network. A token is stochastically selected from this distribution (in our setting, we force deterministic behavior by setting seeds) and added back into the text, which is then re-fed into the transformer, continuing the iterative text generation process.
  • Figure 2: This diagram represents an RL system in which an agent interacts with an environment. The agent receives a state and a reward from the environment based on its previous action. The agent then uses this information to select the next action, which it sends to the environment.
  • Figure 3: Model checking workflow DBLP:journals/sttt/HenselJKQV22. First, the system needs to be formally modeled, for instance, via PRISM. Then, the requirements are formalized, for instance, via PCTL. Eventually, both are inputted into the model checker, like Storm, which verifies the property.
  • Figure 4: A user provides the MDP $M$ of the sequential decision-making task, an LLM policy $\pi(s)$ with $I(s)$ and $O(\omega)$, and a safety PCTL property to get a model checking result.
  • Figure 5: Incrementally built induced DTMC in blue, consisting of the states that are reachable by the LLM policy. Black states are the states of the MDP that are not reachable by the policy and therefore not of interest.

Theorems & Definitions (7)

  • Definition 1: String
  • Definition 2: Large-Language-Model Function
  • Definition 3: MDP
  • Definition 4
  • Example 5.1: Frozen Lake LLM I(s)
  • Example 5.2: Taxi LLM I(s)
  • Example 5.3: Stock Market LLM I(s)