Table of Contents
Fetching ...

Robust Almost-Sure Reachability in Multi-Environment MDPs

Marck van der Vegt, Nils Jansen, Sebastian Junges

TL;DR

The paper addresses robustness of reachability in multi-environment MDPs (MEMDPs), where a single policy must succeed across all environments. It introduces a belief-observation MDP (BOMDP) framework that couples MEMDP states with a monotone belief over environments, and proves that a policy winning in the BOMDP is winning for the MEMDP, enabling memoryless policy search for the almost-sure reachability objective. The authors establish PSPACE-completeness for the MEMDP decision problem and show that optimal robust policies may require exponential memory in the number of environments, but they derive a PSPACE algorithm that recurses on belief-fragments using a locally bounded, environment-graph structure. They also implement a practical prototype, PaGE, on Storm to synthesize robust policies and demonstrate competitive performance against POMDP-based baselines on benchmarks, highlighting the approach’s scalability and applicability to real-world robustness problems.

Abstract

Multiple-environment MDPs (MEMDPs) capture finite sets of MDPs that share the states but differ in the transition dynamics. These models form a proper subclass of partially observable MDPs (POMDPs). We consider the synthesis of policies that robustly satisfy an almost-sure reachability property in MEMDPs, that is, one policy that satisfies a property for all environments. For POMDPs, deciding the existence of robust policies is an EXPTIME-complete problem. In this paper, we show that this problem is PSPACE-complete for MEMDPs, while the policies in general require exponential memory. We exploit the theoretical results to develop and implement an algorithm that shows promising results in synthesizing robust policies for various benchmarks.

Robust Almost-Sure Reachability in Multi-Environment MDPs

TL;DR

The paper addresses robustness of reachability in multi-environment MDPs (MEMDPs), where a single policy must succeed across all environments. It introduces a belief-observation MDP (BOMDP) framework that couples MEMDP states with a monotone belief over environments, and proves that a policy winning in the BOMDP is winning for the MEMDP, enabling memoryless policy search for the almost-sure reachability objective. The authors establish PSPACE-completeness for the MEMDP decision problem and show that optimal robust policies may require exponential memory in the number of environments, but they derive a PSPACE algorithm that recurses on belief-fragments using a locally bounded, environment-graph structure. They also implement a practical prototype, PaGE, on Storm to synthesize robust policies and demonstrate competitive performance against POMDP-based baselines on benchmarks, highlighting the approach’s scalability and applicability to real-world robustness problems.

Abstract

Multiple-environment MDPs (MEMDPs) capture finite sets of MDPs that share the states but differ in the transition dynamics. These models form a proper subclass of partially observable MDPs (POMDPs). We consider the synthesis of policies that robustly satisfy an almost-sure reachability property in MEMDPs, that is, one policy that satisfies a property for all environments. For POMDPs, deciding the existence of robust policies is an EXPTIME-complete problem. In this paper, we show that this problem is PSPACE-complete for MEMDPs, while the policies in general require exponential memory. We exploit the theoretical results to develop and implement an algorithm that shows promising results in synthesizing robust policies for various benchmarks.
Paper Structure (3 sections, 1 figure)

This paper contains 3 sections, 1 figure.

Figures (1)

  • Figure 1: Example MEMDP

Theorems & Definitions (2)

  • definition thmcounterdefinition: MDP
  • definition thmcounterdefinition: MEMDP