Learning Robust Policies for Uncertain Parametric Markov Decision Processes

Luke Rickard; Alessandro Abate; Kostas Margellos

Learning Robust Policies for Uncertain Parametric Markov Decision Processes

Luke Rickard, Alessandro Abate, Kostas Margellos

TL;DR

This work addresses robust policy synthesis for uncertain parametric MDPs (upMDPs) by employing a scenario-approach that yields PAC-type guarantees on PCTL satisfaction without assuming a known parameter distribution. It formalizes three policy classes (deterministic, mixed, behavioural) and offers three corresponding solution strategies: interval MDPs for deterministic policies, MaxMin games for mixed policies, and subgradient ascent for behavioural policies. The authors derive PAC guarantees tailored to each method, and demonstrate through numerical experiments on UAV motion planning and benchmark models that the subgradient approach achieves non-trivial risk bounds while maintaining competitive satisfaction probabilities, often outperforming naive interval approaches and highlighting trade-offs with computational cost. The framework advances verifiable robustness in upMDPs for safety-critical applications by enabling policy guarantees under unseen parameter realizations and without strict parametric assumptions. These contributions pave the way for scalable, provably safe control under structural uncertainty in finite-state dynamical systems.

Abstract

Synthesising verifiably correct controllers for dynamical systems is crucial for safety-critical problems. To achieve this, it is important to account for uncertainty in a robust manner, while at the same time it is often of interest to avoid being overly conservative with the view of achieving a better cost. We propose a method for verifiably safe policy synthesis for a class of finite state models, under the presence of structural uncertainty. In particular, we consider uncertain parametric Markov decision processes (upMDPs), a special class of Markov decision processes, with parameterised transition functions, where such parameters are drawn from a (potentially) unknown distribution. Our framework leverages recent advancements in the so-called scenario approach theory, where we represent the uncertainty by means of scenarios, and provide guarantees on synthesised policies satisfying probabilistic computation tree logic (PCTL) formulae. We consider several common benchmarks/problems and compare our work to recent developments for verifying upMDPs.

Learning Robust Policies for Uncertain Parametric Markov Decision Processes

TL;DR

Abstract

Paper Structure (37 sections, 6 theorems, 26 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 37 sections, 6 theorems, 26 equations, 4 figures, 3 tables, 1 algorithm.

Introduction
Background
Markov Decision Processes
Policy Classes
Uncertain Parametric MDP
Probabilistic Computation Tree Logic
PCTL Satisfaction
Robust Policies
Robust Policy Synthesis
Solution by Interval MDPs (under deterministic policies)
MaxMin Game (under mixed policies)
Subgradient Ascent (under behavioural policies)
Guarantees
Numerical Experiments
Concluding Remarks and Future Directions
...and 22 more sections

Key Result

theorem 1

For a two player zero-sum game, there exists at least one mixed strategy profile $s^\star = (s^\star_p,s^\star_\sigma)$, such that If both inequalities are strict, then there is a single unique Nash equilibrium, called a strict Nash equilibrium. Otherwise, there is a set of Nash equilibria, all having equal value.

Figures (4)

Figure 1: Distance from optimal satisfaction probability (found from MNE algorithm) across iterations for small test model
Figure 2: Distance from final satisfaction probability across iterations for UAV model with uniform wind
Figure 3: Runtime against number of samples
Figure 4: Runtime against size of MDP

Theorems & Definitions (6)

theorem 1: Nash Equilibrium nashNoncooperativeGames1989DBLP:journals/tac/FrihaufKB12
theorem 2: PAC Guarantees garattiRiskComplexityScenario2022a
corollary 1: PAC Guarantees for Mixed Policies
corollary 2: PAC Guarantees for Behavioural Policies
corollary 3: PAC Guarantees for iMDP Policies
corollary 4

Learning Robust Policies for Uncertain Parametric Markov Decision Processes

TL;DR

Abstract

Learning Robust Policies for Uncertain Parametric Markov Decision Processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (6)