Table of Contents
Fetching ...

1-2-3-Go! Policy Synthesis for Parameterized Markov Decision Processes via Decision-Tree Learning and Generalization

Muqsit Azeem, Debraj Chakraborty, Sudeep Kanav, Jan Kretinsky, Mohammadsadegh Mohagheghi, Stefanie Mohr, Maximilian Weininger

TL;DR

This work proposes a learning-based approach to generalize optimal policies obtained by model-checking small instances to larger ones using decision-tree learning, providing a practical solution to the state-space explosion problem.

Abstract

Despite the advances in probabilistic model checking, the scalability of the verification methods remains limited. In particular, the state space often becomes extremely large when instantiating parameterized Markov decision processes (MDPs) even with moderate values. Synthesizing policies for such \emph{huge} MDPs is beyond the reach of available tools. We propose a learning-based approach to obtain a reasonable policy for such huge MDPs. The idea is to generalize optimal policies obtained by model-checking small instances to larger ones using decision-tree learning. Consequently, our method bypasses the need for explicit state-space exploration of large models, providing a practical solution to the state-space explosion problem. We demonstrate the efficacy of our approach by performing extensive experimentation on the relevant models from the quantitative verification benchmark set. The experimental results indicate that our policies perform well, even when the size of the model is orders of magnitude beyond the reach of state-of-the-art analysis tools.

1-2-3-Go! Policy Synthesis for Parameterized Markov Decision Processes via Decision-Tree Learning and Generalization

TL;DR

This work proposes a learning-based approach to generalize optimal policies obtained by model-checking small instances to larger ones using decision-tree learning, providing a practical solution to the state-space explosion problem.

Abstract

Despite the advances in probabilistic model checking, the scalability of the verification methods remains limited. In particular, the state space often becomes extremely large when instantiating parameterized Markov decision processes (MDPs) even with moderate values. Synthesizing policies for such \emph{huge} MDPs is beyond the reach of available tools. We propose a learning-based approach to obtain a reasonable policy for such huge MDPs. The idea is to generalize optimal policies obtained by model-checking small instances to larger ones using decision-tree learning. Consequently, our method bypasses the need for explicit state-space exploration of large models, providing a practical solution to the state-space explosion problem. We demonstrate the efficacy of our approach by performing extensive experimentation on the relevant models from the quantitative verification benchmark set. The experimental results indicate that our policies perform well, even when the size of the model is orders of magnitude beyond the reach of state-of-the-art analysis tools.

Paper Structure

This paper contains 39 sections, 3 figures, 9 tables, 1 algorithm.

Figures (3)

  • Figure 1: A parameterized, scalable MDP with $k+1$ blocks, described in Example \ref{['ex:modules-mdp']}.
  • Figure 2: (a) MDP instance for $m=1$ from Figure \ref{['fig:modules-mdp']}. (b) The optimal policy represented as a DT as learned by our approach.
  • Figure 3: Robustness of the policy given by 1-2-3-Go!, Storm and MODES for different instances of zeroconf_dl + deadline_max and philosophers + eat. For larger models, while Storm times out and MODES gives sub-optimal policies, 1-2-3-Go! consistently gives better results irrespective of $N$.

Theorems & Definitions (3)

  • definition 1
  • definition 2
  • definition 3