Table of Contents
Fetching ...

Policies Grow on Trees: Model Checking Families of MDPs

Roman Andriushchenko, Milan Češka, Sebastian Junges, Filip Macák

TL;DR

The paper tackles synthesis of winning policies for a family of MDPs to capture design-time variations, formalized via quotient MDPs and flexible policy trees. It introduces a novel game-based abstraction over stochastic games to identify robust policies for subfamilies and then refines the decomposition recursively to cover the entire family. The approach yields compact policy trees and demonstrates scalability to millions of MDPs, achieving substantial speedups over naive baselines and enabling practical use on large, varied model sets (e.g., 246 winning policies for 94 million MDPs in a benchmark within under 30 minutes). The work provides a principled framework for modeling, analyzing, and representing robust policies across families of MDPs, with potential impact on design-time verification and configurable systems.

Abstract

Markov decision processes (MDPs) provide a fundamental model for sequential decision making under process uncertainty. A classical synthesis task is to compute for a given MDP a winning policy that achieves a desired specification. However, at design time, one typically needs to consider a family of MDPs modelling various system variations. For a given family, we study synthesising (1) the subset of MDPs where a winning policy exists and (2) a preferably small number of winning policies that together cover this subset. We introduce policy trees that concisely capture the synthesis result. The key ingredient for synthesising policy trees is a recursive application of a game-based abstraction. We combine this abstraction with an efficient refinement procedure and a post-processing step. An extensive empirical evaluation demonstrates superior scalability of our approach compared to naive baselines. For one of the benchmarks, we find 246 winning policies covering 94 million MDPs. Our algorithm requires less than 30 minutes, whereas the naive baseline only covers 3.7% of MDPs in 24 hours.

Policies Grow on Trees: Model Checking Families of MDPs

TL;DR

The paper tackles synthesis of winning policies for a family of MDPs to capture design-time variations, formalized via quotient MDPs and flexible policy trees. It introduces a novel game-based abstraction over stochastic games to identify robust policies for subfamilies and then refines the decomposition recursively to cover the entire family. The approach yields compact policy trees and demonstrates scalability to millions of MDPs, achieving substantial speedups over naive baselines and enabling practical use on large, varied model sets (e.g., 246 winning policies for 94 million MDPs in a benchmark within under 30 minutes). The work provides a principled framework for modeling, analyzing, and representing robust policies across families of MDPs, with potential impact on design-time verification and configurable systems.

Abstract

Markov decision processes (MDPs) provide a fundamental model for sequential decision making under process uncertainty. A classical synthesis task is to compute for a given MDP a winning policy that achieves a desired specification. However, at design time, one typically needs to consider a family of MDPs modelling various system variations. For a given family, we study synthesising (1) the subset of MDPs where a winning policy exists and (2) a preferably small number of winning policies that together cover this subset. We introduce policy trees that concisely capture the synthesis result. The key ingredient for synthesising policy trees is a recursive application of a game-based abstraction. We combine this abstraction with an efficient refinement procedure and a post-processing step. An extensive empirical evaluation demonstrates superior scalability of our approach compared to naive baselines. For one of the benchmarks, we find 246 winning policies covering 94 million MDPs. Our algorithm requires less than 30 minutes, whereas the naive baseline only covers 3.7% of MDPs in 24 hours.
Paper Structure (60 sections, 4 theorems, 7 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 60 sections, 4 theorems, 7 equations, 6 figures, 3 tables, 1 algorithm.

Key Result

lemma 1

Consider family $\mathcal{M}\xspace= \{M_i\}_{i \in \mathcal{I}\xspace}$ of MDPs and its quotient MDP $\mathcal{Q}\xspace_\mathcal{M}\xspace$. and thus for the threshold $\geq \lambda$ and all $M_i$, it holds that ${\mathcal{P}\xspace(M_i) = \varnothing\xspace}$Similarly, $\mathbb{P}_{\min}\left[ \mathcal{Q}\xspace_\mathcal{M}\xspace \models \Diamond T\right ] \geq \lambda$ implies $\forall M_i \

Figures (6)

  • Figure 1: Illustrative example for a set of MDPs and a policy tree
  • Figure 2: (a) Family $\mathcal{M}\xspace$ of two MDPs. (b) Quotient MDP $\mathcal{Q}\xspace_\mathcal{M}\xspace$ for the family $\mathcal{M}\xspace$. (c) Game abstraction for family $\mathcal{M}\xspace$. Circles (squares) denote states of Player 1 (2). Transitions without explicit probability have a probability of 1. States $s_T$ and $s_F$ are absorbing, and their action with self-loop is omitted.
  • Figure 3: Quotient (left) and game abstraction (right) for a simple family of MDPs.
  • Figure 4: Policy tree before (left) and after (right) post-processing.
  • Figure 5: Running example from Fig. \ref{['fig:running-example:grid']} encoded as a Prism program sketch. "Hole" variables OX and OY can be instantiated to one of the values in their specified domain, and when all holes are instantiated, we obtain one concrete MDP.
  • ...and 1 more figures

Theorems & Definitions (15)

  • definition 1: MDP
  • definition 2: Family of MDPs
  • definition 3: Policy map
  • definition 4: Robust policy
  • definition 5: Quotient MDP
  • lemma 1
  • definition 6: Policy tree
  • definition 7: Stochastic game
  • definition 8: Game abstraction
  • lemma 2
  • ...and 5 more