Table of Contents
Fetching ...

Learning Robust Policies for Uncertain Parametric Markov Decision Processes

Luke Rickard, Alessandro Abate, Kostas Margellos

TL;DR

This work addresses robust policy synthesis for uncertain parametric MDPs (upMDPs) by employing a scenario-approach that yields PAC-type guarantees on PCTL satisfaction without assuming a known parameter distribution. It formalizes three policy classes (deterministic, mixed, behavioural) and offers three corresponding solution strategies: interval MDPs for deterministic policies, MaxMin games for mixed policies, and subgradient ascent for behavioural policies. The authors derive PAC guarantees tailored to each method, and demonstrate through numerical experiments on UAV motion planning and benchmark models that the subgradient approach achieves non-trivial risk bounds while maintaining competitive satisfaction probabilities, often outperforming naive interval approaches and highlighting trade-offs with computational cost. The framework advances verifiable robustness in upMDPs for safety-critical applications by enabling policy guarantees under unseen parameter realizations and without strict parametric assumptions. These contributions pave the way for scalable, provably safe control under structural uncertainty in finite-state dynamical systems.

Abstract

Synthesising verifiably correct controllers for dynamical systems is crucial for safety-critical problems. To achieve this, it is important to account for uncertainty in a robust manner, while at the same time it is often of interest to avoid being overly conservative with the view of achieving a better cost. We propose a method for verifiably safe policy synthesis for a class of finite state models, under the presence of structural uncertainty. In particular, we consider uncertain parametric Markov decision processes (upMDPs), a special class of Markov decision processes, with parameterised transition functions, where such parameters are drawn from a (potentially) unknown distribution. Our framework leverages recent advancements in the so-called scenario approach theory, where we represent the uncertainty by means of scenarios, and provide guarantees on synthesised policies satisfying probabilistic computation tree logic (PCTL) formulae. We consider several common benchmarks/problems and compare our work to recent developments for verifying upMDPs.

Learning Robust Policies for Uncertain Parametric Markov Decision Processes

TL;DR

This work addresses robust policy synthesis for uncertain parametric MDPs (upMDPs) by employing a scenario-approach that yields PAC-type guarantees on PCTL satisfaction without assuming a known parameter distribution. It formalizes three policy classes (deterministic, mixed, behavioural) and offers three corresponding solution strategies: interval MDPs for deterministic policies, MaxMin games for mixed policies, and subgradient ascent for behavioural policies. The authors derive PAC guarantees tailored to each method, and demonstrate through numerical experiments on UAV motion planning and benchmark models that the subgradient approach achieves non-trivial risk bounds while maintaining competitive satisfaction probabilities, often outperforming naive interval approaches and highlighting trade-offs with computational cost. The framework advances verifiable robustness in upMDPs for safety-critical applications by enabling policy guarantees under unseen parameter realizations and without strict parametric assumptions. These contributions pave the way for scalable, provably safe control under structural uncertainty in finite-state dynamical systems.

Abstract

Synthesising verifiably correct controllers for dynamical systems is crucial for safety-critical problems. To achieve this, it is important to account for uncertainty in a robust manner, while at the same time it is often of interest to avoid being overly conservative with the view of achieving a better cost. We propose a method for verifiably safe policy synthesis for a class of finite state models, under the presence of structural uncertainty. In particular, we consider uncertain parametric Markov decision processes (upMDPs), a special class of Markov decision processes, with parameterised transition functions, where such parameters are drawn from a (potentially) unknown distribution. Our framework leverages recent advancements in the so-called scenario approach theory, where we represent the uncertainty by means of scenarios, and provide guarantees on synthesised policies satisfying probabilistic computation tree logic (PCTL) formulae. We consider several common benchmarks/problems and compare our work to recent developments for verifying upMDPs.
Paper Structure (37 sections, 6 theorems, 26 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 37 sections, 6 theorems, 26 equations, 4 figures, 3 tables, 1 algorithm.

Key Result

theorem 1

For a two player zero-sum game, there exists at least one mixed strategy profile $s^\star = (s^\star_p,s^\star_\sigma)$, such that If both inequalities are strict, then there is a single unique Nash equilibrium, called a strict Nash equilibrium. Otherwise, there is a set of Nash equilibria, all having equal value.

Figures (4)

  • Figure 1: Distance from optimal satisfaction probability (found from MNE algorithm) across iterations for small test model
  • Figure 2: Distance from final satisfaction probability across iterations for UAV model with uniform wind
  • Figure 3: Runtime against number of samples
  • Figure 4: Runtime against size of MDP

Theorems & Definitions (6)

  • theorem 1: Nash Equilibrium nashNoncooperativeGames1989DBLP:journals/tac/FrihaufKB12
  • theorem 2: PAC Guarantees garattiRiskComplexityScenario2022a
  • corollary 1: PAC Guarantees for Mixed Policies
  • corollary 2: PAC Guarantees for Behavioural Policies
  • corollary 3: PAC Guarantees for iMDP Policies
  • corollary 4