Finite-memory Strategies for Almost-sure Energy-MeanPayoff Objectives in MDPs

Mohan Dantam; Richard Mayr

Finite-memory Strategies for Almost-sure Energy-MeanPayoff Objectives in MDPs

Mohan Dantam, Richard Mayr

TL;DR

This work studies finite-state MDPs under the Energy-MeanPayoff objective, requiring nonnegative energy in the first dimension and strictly positive mean payoff in the remaining dimensions. It proves that almost-sure winning strategies exist with finite memory, and that an exponential amount of memory is both sufficient (even for deterministic strategies) and necessary (even for randomized strategies). The results extend to multidimensional reward settings and establish pseudo-polynomial decidability for the existence of almost-sure winning strategies. The analysis introduces a two-phase Gain/Bailout strategy framework and a reduction to an augmented MDP to bound memory requirements, offering concrete guidance for synthesizing finite-memory controllers under energy-constrained long-run performance criteria.

Abstract

We consider finite-state Markov decision processes with the combined Energy-MeanPayoff objective. The controller tries to avoid running out of energy while simultaneously attaining a strictly positive mean payoff in a second dimension. We show that finite memory suffices for almost surely winning strategies for the Energy-MeanPayoff objective. This is in contrast to the closely related Energy-Parity objective, where almost surely winning strategies require infinite memory in general. We show that exponential memory is sufficient (even for deterministic strategies) and necessary (even for randomized strategies) for almost surely winning Energy-MeanPayoff. The upper bound holds even if the strictly positive mean payoff part of the objective is generalized to multidimensional strictly positive mean payoff. Finally, it is decidable in pseudo-polynomial time whether an almost surely winning strategy exists.

Finite-memory Strategies for Almost-sure Energy-MeanPayoff Objectives in MDPs

TL;DR

Abstract

Paper Structure (14 sections, 6 theorems, 1 equation)

This paper contains 14 sections, 6 theorems, 1 equation.

Introduction
Background.
Related work.
Our contribution.
Preliminaries
MDPs and Markov chains.
Strategies.
Measure.
Objectives.
Energy/Reward/Counter-based objectives.
Multidimensional reward-based objectives.
Size of an instance.
The Main Result
Proof of finite memory strategy existence

Key Result

Theorem 1

Let ${\mathcal{M}} = \mleft(S,S_\Box,S_\ocircle,{E},P,\bm{{r}}\mright)$ be an MDP with $d$-dimensional rewards on the edges $\bm{{r}}: {E} \to [-R,R]^d$. For the multidimensional Energy-MeanPayoff objective $\mathtt{EN}_1(k)\,\cap\,\bm{{\operatorname{\mathtt{MP}}_{[2,d]}\mleft(> 0\mright)}}$ the fol

Theorems & Definitions (7)

Theorem 1
Definition 2
Lemma 3
Lemma 4
Lemma 5
Lemma 6
Lemma 7

Finite-memory Strategies for Almost-sure Energy-MeanPayoff Objectives in MDPs

TL;DR

Abstract

Finite-memory Strategies for Almost-sure Energy-MeanPayoff Objectives in MDPs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (7)