Small Decision Trees for MDPs with Deductive Synthesis
Roman Andriushchenko, Milan Češka, Sebastian Junges, Filip Macák
TL;DR
The paper addresses the challenge of representing optimal or near-optimal policies for Markov decision processes (MDPs) with small, interpretable decision trees (DTs). It introduces a two‑pronged approach: (i) an SMT-based encoding to map a given policy into a small DT, and (ii) dtPaynt, an abstraction‑refinement loop that searches for policies optimal within the class of DTs up to a fixed depth. The method leverages unsatisfiable cores and a harmonization mechanism to iteratively refine the search, overapproximate policy families, and split the space into tractable subproblems. Empirically, dtPaynt outperforms the state-of-the-art MILP-based synthesis (OMDT) on many benchmarks, producing DTs up to 20× smaller and achieving near‑optimal values on models with up to 10k states and 19 variables, and it scales to much larger MCs when small DTs suffice. This work advances interpretable policy synthesis for large, real-world MDPs and offers a practical reduction pathway to compact, verifiable decision trees for complex policies.
Abstract
Markov decision processes (MDPs) describe sequential decision-making processes; MDP policies return for every state in that process an advised action. Classical algorithms can efficiently compute policies that are optimal with respect to, e.g., reachability probabilities. However, these policies are then given in a tabular format. A longstanding challenge is to represent optimal or almost-optimal policies concisely, e.g., as decision trees. This paper makes two contributions towards this challenge: first, an SMT-based approach to encode a given (optimal) policy as a small decision tree, and second, an abstraction-refinement loop that searches for policies that are optimal within the set of policies that can be represented with a small tree. Technically, the latter combines the SMT encoding with verification approaches for families of Markov chains. The empirical evaluation demonstrates the feasibility of these approaches and shows how they can outperform the state-of-the-art on various benchmarks, yielding up to 20 times smaller trees representing (almost) optimal policies for models with up to 10k states and 19 variables.
