Multiple Mean-Payoff Optimization under Local Stability Constraints
David Klaška, Antonín Kučera, Vojtěch Kůr, Vít Musil, Vojtěch Řehák
TL;DR
The work tackles local stability in mean-payoff optimization by introducing window mean payoffs to bound short-horizon performance while jointly optimizing multiple objectives. It proposes WinMPsynt, a differentiable-programming–based algorithm that uses dynamic programming to compute expectations and gradients with respect to an Eval objective, enabling gradient-driven strategy improvement for finite-memory randomized strategies in Markov decision processes. The authors prove NP-hardness for the problem in general, but demonstrate practical scalability and high-quality strategies on nontrivial instances, thanks to decomposability of Eval and efficient DP computations. Experiments on structured graphs show substantial speedups over naive baselines and reveal the beneficial roles of memory and randomization in achieving near-optimal window payoffs, highlighting the method's procedural impact for practice in dependable autonomous systems.
Abstract
The long-run average payoff per transition (mean payoff) is the main tool for specifying the performance and dependability properties of discrete systems. The problem of constructing a controller (strategy) simultaneously optimizing several mean payoffs has been deeply studied for stochastic and game-theoretic models. One common issue of the constructed controllers is the instability of the mean payoffs, measured by the deviations of the average rewards per transition computed in a finite "window" sliding along a run. Unfortunately, the problem of simultaneously optimizing the mean payoffs under local stability constraints is computationally hard, and the existing works do not provide a practically usable algorithm even for non-stochastic models such as two-player games. In this paper, we design and evaluate the first efficient and scalable solution to this problem applicable to Markov decision processes.
