Formally Verified Approximate Policy Iteration

Maximilian Schäffeler; Mohammad Abdulaziz

Formally Verified Approximate Policy Iteration

Maximilian Schäffeler, Mohammad Abdulaziz

TL;DR

The paper tackles the challenge of formally verifying an approximate policy iteration algorithm for factored MDPs using Isabelle/HOL, and then derives a verified executable implementation that includes a certified LP solver. It introduces a refinement workflow based on locales to connect abstract specifications to concrete code, and demonstrates a Scala-exportable pipeline with LP certificate checking for practical deployment. Empirical evaluation on a ring-network domain shows the approach scales to hundreds of thousands of states and confirms convergence within a handful of iterations, while also revealing solver limitations. The work advances rigorous verification for planning under uncertainty, offering reusable formal foundations and a concrete pathway to trustworthy, verifiable AI solvers for large, structured MDPs.

Abstract

We formally verify an algorithm for approximate policy iteration on Factored Markov Decision Processes using the interactive theorem prover Isabelle/HOL. Next, we show how the formalized algorithm can be refined to an executable, verified implementation. The implementation is evaluated on benchmark problems to show its practicability. As part of the refinement, we develop verified software to certify Linear Programming solutions. The algorithm builds on a diverse library of formalized mathematics and pushes existing methodologies for interactive theorem provers to the limits. We discuss the process of the verification project and the modifications to the algorithm needed for formal verification.

Formally Verified Approximate Policy Iteration

TL;DR

Abstract

Paper Structure (27 sections, 1 theorem, 11 equations, 2 figures, 2 tables)

This paper contains 27 sections, 1 theorem, 11 equations, 2 figures, 2 tables.

Introduction
Background
Isabelle/HOL
Factored MDPs
Linear Value Functions
Approximate Policy Iteration
Policy Improvement
Factored Bellman Error
Branch Error
Variable Elimination
Value Determination
LPs for Branches
Factored LP Construction
Code Generation
Refinement using Locales
...and 12 more sections

Key Result

Theorem 1

Let $api(w_0, \mathit{\pi}_0) = (t', \mathit{\pi}, w, \mathit{err}, \mathit{True})$. Then $(1 - \mathit{\gamma}) \| \mathit{\nu}^* - \mathit{\nu}_{w} \| \le 2\mathit{\gamma}\cdot\mathit{err}$.

Figures (2)

Figure 1: (a) Ring network of 4 connected computers. (b) Variable dependencies of the default action. (c) Probabilities of machine $C_1$ working in the next step, for every state of the machines $C_1$ and $C_4$.
Figure 2: Simplified Locale Hierarchy

Theorems & Definitions (1)

Theorem 1

Formally Verified Approximate Policy Iteration

TL;DR

Abstract

Formally Verified Approximate Policy Iteration

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (1)