Learning Robust Policies for Uncertain Parametric Markov Decision Processes
Luke Rickard, Alessandro Abate, Kostas Margellos
TL;DR
This work addresses robust policy synthesis for uncertain parametric MDPs (upMDPs) by employing a scenario-approach that yields PAC-type guarantees on PCTL satisfaction without assuming a known parameter distribution. It formalizes three policy classes (deterministic, mixed, behavioural) and offers three corresponding solution strategies: interval MDPs for deterministic policies, MaxMin games for mixed policies, and subgradient ascent for behavioural policies. The authors derive PAC guarantees tailored to each method, and demonstrate through numerical experiments on UAV motion planning and benchmark models that the subgradient approach achieves non-trivial risk bounds while maintaining competitive satisfaction probabilities, often outperforming naive interval approaches and highlighting trade-offs with computational cost. The framework advances verifiable robustness in upMDPs for safety-critical applications by enabling policy guarantees under unseen parameter realizations and without strict parametric assumptions. These contributions pave the way for scalable, provably safe control under structural uncertainty in finite-state dynamical systems.
Abstract
Synthesising verifiably correct controllers for dynamical systems is crucial for safety-critical problems. To achieve this, it is important to account for uncertainty in a robust manner, while at the same time it is often of interest to avoid being overly conservative with the view of achieving a better cost. We propose a method for verifiably safe policy synthesis for a class of finite state models, under the presence of structural uncertainty. In particular, we consider uncertain parametric Markov decision processes (upMDPs), a special class of Markov decision processes, with parameterised transition functions, where such parameters are drawn from a (potentially) unknown distribution. Our framework leverages recent advancements in the so-called scenario approach theory, where we represent the uncertainty by means of scenarios, and provide guarantees on synthesised policies satisfying probabilistic computation tree logic (PCTL) formulae. We consider several common benchmarks/problems and compare our work to recent developments for verifying upMDPs.
