Table of Contents
Fetching ...

Constrained and Robust Policy Synthesis with Satisfiability-Modulo-Probabilistic-Model-Checking

Linus Heck, Filip Macák, Milan Češka, Sebastian Junges

TL;DR

This paper contributes the first approach to effectively compute robust policies subject to arbitrary structural constraints using a flexible and efficient framework and achieves flexibility by allowing to express constraints in first-order theory over a set of MDPs.

Abstract

The ability to compute reward-optimal policies for given and known finite Markov decision processes (MDPs) underpins a variety of applications across planning, controller synthesis, and verification. However, we often want policies (1) to be robust, i.e., they perform well on perturbations of the MDP and (2) to satisfy additional structural constraints regarding, e.g., their representation or implementation cost. Computing such robust and constrained policies is indeed computationally more challenging. This paper contributes the first approach to effectively compute robust policies subject to arbitrary structural constraints using a flexible and efficient framework. We achieve flexibility by allowing to express our constraints in a first-order theory over a set of MDPs, while the root for our efficiency lies in the tight integration of satisfiability solvers to handle the combinatorial nature of the problem and probabilistic model checking algorithms to handle the analysis of MDPs. Experiments on a few hundred benchmarks demonstrate the feasibility for constrained and robust policy synthesis and the competitiveness with state-of-the-art methods for various fragments of the problem.

Constrained and Robust Policy Synthesis with Satisfiability-Modulo-Probabilistic-Model-Checking

TL;DR

This paper contributes the first approach to effectively compute robust policies subject to arbitrary structural constraints using a flexible and efficient framework and achieves flexibility by allowing to express constraints in first-order theory over a set of MDPs.

Abstract

The ability to compute reward-optimal policies for given and known finite Markov decision processes (MDPs) underpins a variety of applications across planning, controller synthesis, and verification. However, we often want policies (1) to be robust, i.e., they perform well on perturbations of the MDP and (2) to satisfy additional structural constraints regarding, e.g., their representation or implementation cost. Computing such robust and constrained policies is indeed computationally more challenging. This paper contributes the first approach to effectively compute robust policies subject to arbitrary structural constraints using a flexible and efficient framework. We achieve flexibility by allowing to express our constraints in a first-order theory over a set of MDPs, while the root for our efficiency lies in the tight integration of satisfiability solvers to handle the combinatorial nature of the problem and probabilistic model checking algorithms to handle the analysis of MDPs. Experiments on a few hundred benchmarks demonstrate the feasibility for constrained and robust policy synthesis and the competitiveness with state-of-the-art methods for various fragments of the problem.

Paper Structure

This paper contains 56 sections, 9 theorems, 7 equations, 13 figures, 12 tables, 1 algorithm.

Key Result

Theorem 4

For a colored MDP $\mathcal{C}$, the assignment $\theta_X \in \mathbb{V}_X$ over variables $X = \{x_1, \ldots, x_n\}$ is a satisfying assignment of the existential quantifier in the sentence $\Phi$ if and only if $\theta_X$ is a solution to the problem statement.

Figures (13)

  • Figure 1: One axis represents the set of policies $\pi_i$, the other the set of possible environments $e_j$. Some policies (or environments) may not be relevant ($\boldsymbol{\bot}$), e.g. when the policy does not satisfy structural constraints. For all tuples of relevant policy $\pi_i$ and environment $e_j$, the combination performs satisfactory (✓) or unsatisfactory (✗). Satisfactory performance is captured by meeting a threshold $\nu$ on the value of the policy on the environment. The general task is to find a policy $\pi_i$ such that for every environment $e_j$, cell $(i,j)$ is ✓. While this figure uses a tabular representation, we use concise symbolic descriptions.
  • Figure 2: A separation of concerns using guess-and-check or counterexample-guided-inductive-synthesis.
  • Figure 3: Running example: Beetle wants to reach the target.
  • Figure 4: Markov chains induced by different beetle policies.
  • Figure 5: Sketch of CDCL.
  • ...and 8 more figures

Theorems & Definitions (18)

  • Definition 1: Markov Decision Process
  • Definition 2: Colored MDP
  • Definition 3: Induced MDP
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Theorem 6
  • Theorem 6
  • Theorem 6
  • proof
  • ...and 8 more