Policies Grow on Trees: Model Checking Families of MDPs
Roman Andriushchenko, Milan Češka, Sebastian Junges, Filip Macák
TL;DR
The paper tackles synthesis of winning policies for a family of MDPs to capture design-time variations, formalized via quotient MDPs and flexible policy trees. It introduces a novel game-based abstraction over stochastic games to identify robust policies for subfamilies and then refines the decomposition recursively to cover the entire family. The approach yields compact policy trees and demonstrates scalability to millions of MDPs, achieving substantial speedups over naive baselines and enabling practical use on large, varied model sets (e.g., 246 winning policies for 94 million MDPs in a benchmark within under 30 minutes). The work provides a principled framework for modeling, analyzing, and representing robust policies across families of MDPs, with potential impact on design-time verification and configurable systems.
Abstract
Markov decision processes (MDPs) provide a fundamental model for sequential decision making under process uncertainty. A classical synthesis task is to compute for a given MDP a winning policy that achieves a desired specification. However, at design time, one typically needs to consider a family of MDPs modelling various system variations. For a given family, we study synthesising (1) the subset of MDPs where a winning policy exists and (2) a preferably small number of winning policies that together cover this subset. We introduce policy trees that concisely capture the synthesis result. The key ingredient for synthesising policy trees is a recursive application of a game-based abstraction. We combine this abstraction with an efficient refinement procedure and a post-processing step. An extensive empirical evaluation demonstrates superior scalability of our approach compared to naive baselines. For one of the benchmarks, we find 246 winning policies covering 94 million MDPs. Our algorithm requires less than 30 minutes, whereas the naive baseline only covers 3.7% of MDPs in 24 hours.
