Finite-memory Strategies for Almost-sure Energy-MeanPayoff Objectives in MDPs
Mohan Dantam, Richard Mayr
TL;DR
This work studies finite-state MDPs under the Energy-MeanPayoff objective, requiring nonnegative energy in the first dimension and strictly positive mean payoff in the remaining dimensions. It proves that almost-sure winning strategies exist with finite memory, and that an exponential amount of memory is both sufficient (even for deterministic strategies) and necessary (even for randomized strategies). The results extend to multidimensional reward settings and establish pseudo-polynomial decidability for the existence of almost-sure winning strategies. The analysis introduces a two-phase Gain/Bailout strategy framework and a reduction to an augmented MDP to bound memory requirements, offering concrete guidance for synthesizing finite-memory controllers under energy-constrained long-run performance criteria.
Abstract
We consider finite-state Markov decision processes with the combined Energy-MeanPayoff objective. The controller tries to avoid running out of energy while simultaneously attaining a strictly positive mean payoff in a second dimension. We show that finite memory suffices for almost surely winning strategies for the Energy-MeanPayoff objective. This is in contrast to the closely related Energy-Parity objective, where almost surely winning strategies require infinite memory in general. We show that exponential memory is sufficient (even for deterministic strategies) and necessary (even for randomized strategies) for almost surely winning Energy-MeanPayoff. The upper bound holds even if the strictly positive mean payoff part of the objective is generalized to multidimensional strictly positive mean payoff. Finally, it is decidable in pseudo-polynomial time whether an almost surely winning strategy exists.
