Table of Contents
Fetching ...

The Value Problem for Multiple-Environment MDPs with Parity Objective

Krishnendu Chatterjee, Laurent Doyen, Jean-François Raskin, Ocan Sankur

TL;DR

This work analyzes multiple-environment MDPs (MEMDPs) with parity objectives, where a single strategy must perform well across all environments. It establishes tight complexity boundaries: the limit-sure (value $=1$) problem is PSPACE-complete in general but polynomial-time when the number of environments is fixed, while the gap problem for approximate satisfaction is solvable in double-exponential space; pure strategies suffice for almost-sure and limit-sure winning, though randomization can help for quantitative gaps. The authors develop a suite of techniques—revealed-form MEMDPs, purification of non-distinguishing end-components (purge), common end-components, and learning-by-playing—to derive new characterizations for AS and LS parity, construct finite-memory witnesses, and implement an approximation algorithm for the gap problem. Collectively, these results position MEMDPs as a decidable, practically expressive alternative to POMDPs for tasks requiring robust reasoning across multiple uncertain environments, with explicit memory- and complexity-aware strategies.

Abstract

We consider multiple-environment Markov decision processes (MEMDP), which consist of a finite set of MDPs over the same state space, representing different scenarios of transition structure and probability. The value of a strategy is the probability to satisfy the objective, here a parity objective, in the worst-case scenario, and the value of an MEMDP is the supremum of the values achievable by a strategy. We show that deciding whether the value is 1 is a PSPACE-complete problem, and even in P when the number of environments is fixed, along with new insights to the almost-sure winning problem, which is to decide if there exists a strategy with value 1. Pure strategies are sufficient for theses problems, whereas randomization is necessary in general when the value is smaller than 1. We present an algorithm to approximate the value, running in double exponential space. Our results are in contrast to the related model of partially-observable MDPs where all these problems are known to be undecidable.

The Value Problem for Multiple-Environment MDPs with Parity Objective

TL;DR

This work analyzes multiple-environment MDPs (MEMDPs) with parity objectives, where a single strategy must perform well across all environments. It establishes tight complexity boundaries: the limit-sure (value ) problem is PSPACE-complete in general but polynomial-time when the number of environments is fixed, while the gap problem for approximate satisfaction is solvable in double-exponential space; pure strategies suffice for almost-sure and limit-sure winning, though randomization can help for quantitative gaps. The authors develop a suite of techniques—revealed-form MEMDPs, purification of non-distinguishing end-components (purge), common end-components, and learning-by-playing—to derive new characterizations for AS and LS parity, construct finite-memory witnesses, and implement an approximation algorithm for the gap problem. Collectively, these results position MEMDPs as a decidable, practically expressive alternative to POMDPs for tasks requiring robust reasoning across multiple uncertain environments, with explicit memory- and complexity-aware strategies.

Abstract

We consider multiple-environment Markov decision processes (MEMDP), which consist of a finite set of MDPs over the same state space, representing different scenarios of transition structure and probability. The value of a strategy is the probability to satisfy the objective, here a parity objective, in the worst-case scenario, and the value of an MEMDP is the supremum of the values achievable by a strategy. We show that deciding whether the value is 1 is a PSPACE-complete problem, and even in P when the number of environments is fixed, along with new insights to the almost-sure winning problem, which is to decide if there exists a strategy with value 1. Pure strategies are sufficient for theses problems, whereas randomization is necessary in general when the value is smaller than 1. We present an algorithm to approximate the value, running in double exponential space. Our results are in contrast to the related model of partially-observable MDPs where all these problems are known to be undecidable.

Paper Structure

This paper contains 16 sections, 24 theorems, 49 equations, 5 figures, 2 algorithms.

Key Result

lemma 1

Given an MDP $M$, for all states $q \in Q$ and all strategies $\sigma$, we have $\mathbb{P}_{q}^{\sigma}(M, \{\pi \mid \textrm{\sf Inf}(\pi) \text{ is the support of an end-component}\}) = 1$.

Figures (5)

  • Figure 1: Multiple-environment MDP for the missing card (over 3-card deck). Each $M[e_i]$ represents the behavior of the MEMDP under environment $e_i$ where card $i$ has been removed. The environment can be identified almost-surely (with probability $1$).
  • Figure 2: Multiple-environment MDP for the duplicate card (over 3-card deck). Each $M[e_i]$ represents the behavior of the MEMDP under environment $e_i$ where card $i$ has been duplicated. The environment can be identified limit-surely (with probability arbitrarily close to $1$).
  • Figure 3: An end-component $\{q_1,q_2\}$ with different transition probabilities in environments $e_1$ and $e_2$.
  • Figure 4: The set $\{q_1,q_2\}$ is an end-component in $e_2$, not in $e_1$.
  • Figure 5: An MEMDP $M$ with two environments (left) and the construction $\textrm{\sf purge}(M)$ (right). Transition probabilities are uniform. Here $D$ is the MCEC defined by the pairs $\{(q_3,a), (q_4,a)\}$, and $D'$ is the MCEC defined by $\{(q_5,a), (q_6,a)\}$. The priority function is omitted, we assume that $D$ is winning (e.g., by assigning priority $0$ to $q_3$ and $q_4$) and that $D'$ is losing (e.g., by assigning priority $1$ to $q_5$ and $q_6$).

Theorems & Definitions (40)

  • lemma 1: CY-acm95DeAlfaro-phd97
  • theorem 1: vdVJJ23,SVJ24
  • lemma 2
  • proof
  • theorem 2
  • lemma 3
  • proof
  • theorem 3
  • proof
  • lemma 4
  • ...and 30 more