Table of Contents
Fetching ...

A PSPACE Algorithm for Almost-Sure Rabin Objectives in Multi-Environment MDPs

Marnix Suilen, Marck van der Vegt, Sebastian Junges

TL;DR

This work studies MEMDPs, where a single policy must satisfy an almost-sure Rabin objective across a finite set of environments. The authors develop belief-based finite-memory strategies and show they suffice for almost-sure Rabin objectives, enabling a Belief-Observation MDP (BOMDP) construction that embeds beliefs into the state space. They introduce localized Rabin analysis on J-local MEMDPs with frontier states and present a recursive, PSPACE-bounded algorithm that combines local results to decide the global objective. The results establish $PSPACE$-completeness for almost-sure Rabin objectives in MEMDPs, clarifying the landscape for qualitative MEMDPs and showing that, under possible semantics, MEMDPs can match the complexity of standard MDPs for many objectives. The framework and BOMDP techniques pave the way for robust planning under model uncertainty and potential extensions to other qualitative and quantitative objectives.

Abstract

Markov Decision Processes (MDPs) model systems with uncertain transition dynamics. Multiple-environment MDPs (MEMDPs) extend MDPs. They intuitively reflect finite sets of MDPs that share the same state and action spaces but differ in the transition dynamics. The key objective in MEMDPs is to find a single policy that satisfies a given objective in every associated MDP. The main result of this paper is PSPACE-completeness for almost-sure Rabin objectives in MEMDPs. This result clarifies the complexity landscape for MEMDPs and contrasts with results for the more general class of partially observable MDPs (POMDPs), where almost-sure reachability is already EXPTIME-complete, and almost-sure Rabin objectives are undecidable.

A PSPACE Algorithm for Almost-Sure Rabin Objectives in Multi-Environment MDPs

TL;DR

This work studies MEMDPs, where a single policy must satisfy an almost-sure Rabin objective across a finite set of environments. The authors develop belief-based finite-memory strategies and show they suffice for almost-sure Rabin objectives, enabling a Belief-Observation MDP (BOMDP) construction that embeds beliefs into the state space. They introduce localized Rabin analysis on J-local MEMDPs with frontier states and present a recursive, PSPACE-bounded algorithm that combines local results to decide the global objective. The results establish -completeness for almost-sure Rabin objectives in MEMDPs, clarifying the landscape for qualitative MEMDPs and showing that, under possible semantics, MEMDPs can match the complexity of standard MDPs for many objectives. The framework and BOMDP techniques pave the way for robust planning under model uncertainty and potential extensions to other qualitative and quantitative objectives.

Abstract

Markov Decision Processes (MDPs) model systems with uncertain transition dynamics. Multiple-environment MDPs (MEMDPs) extend MDPs. They intuitively reflect finite sets of MDPs that share the same state and action spaces but differ in the transition dynamics. The key objective in MEMDPs is to find a single policy that satisfies a given objective in every associated MDP. The main result of this paper is PSPACE-completeness for almost-sure Rabin objectives in MEMDPs. This result clarifies the complexity landscape for MEMDPs and contrasts with results for the more general class of partially observable MDPs (POMDPs), where almost-sure reachability is already EXPTIME-complete, and almost-sure Rabin objectives are undecidable.
Paper Structure (19 sections, 28 theorems, 5 equations, 2 figures, 1 table, 3 algorithms)

This paper contains 19 sections, 28 theorems, 5 equations, 2 figures, 1 table, 3 algorithms.

Key Result

Theorem 6

Deciding possible reachability objectives for MEMDPs is in $\mathsf{NL}$. Deciding possible safety, Büchi, Co-Büchi, parity and Rabin objectives for MEMDPs is in $\mathsf{P}$.

Figures (2)

  • Figure 1: A MEMDP with three environments.
  • Figure 2: Example of a BOMDP fragment with Rabin objective $\Phi=\{\langle \{s_1\}, \{s_1\} \rangle, \langle \{s_2\}, \{s_2\} \rangle\}$.

Theorems & Definitions (50)

  • Definition 1: MDP
  • Definition 2: Winning
  • Definition 3: Winning Region
  • Definition 4: Rabin objective
  • Definition 5: MEMDP
  • Theorem 6
  • Theorem 7
  • Corollary 8
  • Definition 9: Belief, belief update
  • Corollary 10
  • ...and 40 more