Multi-Environment MDPs with Prior and Universal Semantics

Benjamin Bordais; Jean-François Raskin

Multi-Environment MDPs with Prior and Universal Semantics

Benjamin Bordais, Jean-François Raskin

TL;DR

This work analyzes MEMDPs under two semantics—universal (adversarial environment) and prior (environment drawn from a fixed distribution)—and establishes that value-1 qualitative questions coincide across semantics. It provides a space-efficient algorithm to approximate the prior parity value and a PSPACE/EXPSPACE gap-problem solver, and then shows the universal value equals the infimum of prior values over all beliefs, yielding a tighter, same-complexity universal-gap procedure. The results position MEMDPs under the prior semantics as a practical, tractable subclass of POMDPs, grounded by an entropy-based characterization that non-increasing belief entropy allows an exponential reduction to prior-MEMDPs. Together, these contributions advance the algorithmic understanding of MEMDPs and their relation to POMDPs while offering concrete complexity guarantees for qualitative and quantitative synthesis tasks.

Abstract

Multiple-environment Markov decision processes (MEMDPs) equip an MDP with several probabilistic transition functions (one per possible environment) so that the state is observable but the environment is not. Previous work studies two semantics: (i) the universal semantics, where an adversary picks the environment; and (ii) the prior semantics, where the environment is drawn once before execution from a fixed distribution. We clarify the relation between these semantics. For parity objectives, we show that the qualitative questions, i.e. value one, coincide, and we develop a new algorithm for the general value of MEMDP with prior semantics. In particular, we show that the prior value of an MEMDP with a parity objective can be approximated to any precision with a space efficient algorithm; equivalently, the associated gap problem is decidable in PSPACE when probabilities are given in unary (and in EXPSPACE otherwise). We then prove that the universal value equals the infimum of prior values over all beliefs. This yields a new algorithm for the universal gap problem with the same complexity (PSPACE for unary probabilities, EXPSPACE in general), improving on earlier doubly-exponential-space procedures. Finally, we observe that MEMDPs under the prior semantics form an important tractable subclass of POMDPs: our algorithms exploit the fact that belief entropy never increases, and we establish that any POMDP with this property reduces effectively to a prior-MEMDP, showing that prior-MEMDPs capture a broad and practically relevant subclass of POMDPs.

Multi-Environment MDPs with Prior and Universal Semantics

TL;DR

Abstract

Paper Structure (38 sections, 30 theorems, 133 equations, 1 figure, 2 algorithms)

This paper contains 38 sections, 30 theorems, 133 equations, 1 figure, 2 algorithms.

Introduction
Universal and prior semantics.
Related work.
Definitions
MDPs and strategies.
Objectives.
MEMDPs.
Values in MEMDPs.
Qualitative and quantitative problem
Qualitative problem
Quantitative problem
Algorithm solving the gap problem.
Relating the $\mathsf{pr}$- and $\mathsf{uni}$-values
How to use Theorem \ref{['thm:approx_value']}.
Characterization of MEMDPs with entropy
...and 23 more sections

Key Result

Proposition 1

Consider an MEMDP $\Gamma$, a state $q \in Q$, and a parity objective $W$. Let $b \in \mathcal{D}(E)$ such that $\mathsf{Supp} (b) = E$. Then $\mathsf{val}^{\mathsf{uni}}_q(\Gamma,W) = 1$ if and only if $\mathsf{val}^{\mathsf{pr}}_q(\Gamma,b,W) = 1$ and, for all $\sigma \in \mathsf{Strat}(Q,A)$: $\m

Figures (1)

Figure 1: The figure depicts an MEMDP (inspired from DBLP:conf/icalp/Chatterjee0RS25) modelling a simple card game played with a deck containing two card types, $1$ and $2$. The deck composition is parameterized by $\alpha_1$ and $\alpha_2$, with $\alpha_1+\alpha_2=1$; for instance, if card $1$ appears twice as often as card $2$, then $\alpha_1=2\alpha_2$. At each turn, the player may either guess which card type is in the majority or request an additional draw from the deck; after a draw request, the game continues with probability $1-\alpha_0$, while with probability $\alpha_0$ the player is forced to guess immediately. The player’s objective is to reach the winning state $W$, which corresponds to correctly guessing the card type that has the larger number of copies in the deck; this is captured by suitable choices of the parameters (e.g., $\beta$ and $\gamma$) governing the transition from the guess action to $W$ or $L$.

Theorems & Definitions (75)

Proposition 1: Proof \ref{['proof:prop_value_one']}
proof : Proof sketch
Corollary 2: of DBLP:conf/icalp/Chatterjee0RS25
Definition 3: Belief update
Lemma 4: Proof \ref{['proof:lem_small_belief_change_ok']}
Theorem 5: Proof \ref{['proof:thm_main_enough_actions_small_belief']}
proof : Proof sketch
Theorem 6
Theorem 7
Lemma 8: Proof \ref{['proof:lem_vonBeuman']}
...and 65 more

Multi-Environment MDPs with Prior and Universal Semantics

TL;DR

Abstract

Multi-Environment MDPs with Prior and Universal Semantics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (75)