Optimising expectation with guarantees for window mean payoff in Markov decision processes

Pranshu Gaba; Shibashis Guha

Optimising expectation with guarantees for window mean payoff in Markov decision processes

Pranshu Gaba, Shibashis Guha

TL;DR

This work addresses strategy synthesis in Markov decision processes to maximize window mean-payoff in expectation while guaranteeing performance under three guarantee notions: BWC (sure), BAS (almost-sure), and BPT (probabilistic). It distinguishes fixed window (FWMP) from bounded window (BWMP) objectives and develops dedicated algorithms: FWMP with guarantees is solvable in polynomial time, while BWMP with guarantees falls into NP and coNP, matching the complexity of non-guaranteed expectation problems. A key methodological theme is the maximal end-component (MEC) decomposition, along with MEC-collapsing and linear programming reductions to mean-payoff problems, enabling ε-optimal strategies (finite-memory and, for BPT, randomized in general). The results provide practical synthesis methods that simultaneously guarantee minimum performance and optimize expected payoff, with implications for robust design under both worst-case and stochastic environments.

Abstract

The window mean-payoff objective strengthens the classical mean-payoff objective by computing the mean-payoff over a finite window that slides along an infinite path. Two variants have been considered: in one variant, the maximum window length is fixed and given, while in the other, it is not fixed but is required to be bounded. In this paper, we look at the problem of synthesising strategies in Markov decision processes that maximise the window mean-payoff value in expectation, while also simultaneously guaranteeing that the value is above a certain threshold. We solve the synthesis problem for three different kinds of guarantees: sure (that needs to be satisfied in the worst-case, that is, for an adversarial environment), almost-sure (that needs to be satisfied with probability one), and probabilistic (that needs to be satisfied with at least some given probability $p$). We show that for fixed window mean-payoff objective, all the three problems are in $\mathsf{PTIME}$, while for bounded window mean-payoff objective, they are in $\mathsf{NP} \cap \mathsf{coNP}$, and thus have the same complexity as for maximising the expected performance without any guarantee. Moreover, we show that pure finite-memory strategies suffice for maximising the expectation with sure and almost-sure guarantees, whereas, for maximising expectation with a probabilistic guarantee, randomised strategies are necessary in general.

Optimising expectation with guarantees for window mean payoff in Markov decision processes

TL;DR

Abstract

). We show that for fixed window mean-payoff objective, all the three problems are in

, while for bounded window mean-payoff objective, they are in

, and thus have the same complexity as for maximising the expected performance without any guarantee. Moreover, we show that pure finite-memory strategies suffice for maximising the expectation with sure and almost-sure guarantees, whereas, for maximising expectation with a probabilistic guarantee, randomised strategies are necessary in general.

Paper Structure (9 sections, 10 theorems, 3 equations, 4 figures, 2 algorithms)

This paper contains 9 sections, 10 theorems, 3 equations, 4 figures, 2 algorithms.

Introduction
Preliminaries
Problem definition
Expected fixed window mean-payoff value with guarantees
Sure guarantee
Probabilistic guarantee
Almost-sure guarantee
Expected bounded window mean-payoff value with guarantees
Conclusion

Key Result

Lemma 1

Given an MDP $\mathcal{M}$ and a set $T \subseteq V$ of target states, we can compute in polynomial time for each vertex $v \in V$, the probability $p^*_v = \sup_\sigma \mathsf{Pr}^{\sigma}_{\mathcal{M}, v}(\mathsf{Reach}(T))$ with which the player can ensure visiting $T$. There is an optimal unifor

Figures (4)

Figure 1: An example of an MDP.
Figure 2: The $\varphi_{\mathsf{BWMP}}$-value of the run $\pi'$ is $0$ but $\pi'$ does not belong to $\mathsf{BWMP}(0)$ .
Figure 3: The edge $(v_{-1}, u_0)$ has payoff $1$ and edge $(v_m, u_m)$ has payoff $3$. Every other edge has payoff $-1$.
Figure 4: An example of an MDP for $\mathsf{BPT}((0.5,0),2)$ with $\ell = 2$.

Theorems & Definitions (13)

Lemma 1: Optimal reachability BK08
Lemma 2: Long-run appearance in MECs BK08
Proposition 3
Proposition 4
Example 5
Lemma 6
Example 7
Theorem 8
Example 9
Lemma 10
...and 3 more

Optimising expectation with guarantees for window mean payoff in Markov decision processes

TL;DR

Abstract

Optimising expectation with guarantees for window mean payoff in Markov decision processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (13)