An Objective Improvement Approach to Solving Discounted Payoff Games
Daniele Dell'Erba, Arthur Dumas, Sven Schewe
TL;DR
The paper introduces a fully symmetric objective-improvement approach for discounted payoff games that preserves the entire edge-inequation system while optimizing an objective based on a fixed outgoing edge per vertex. By iteratively solving linear programs and updating either the objective or the chosen edges, the method drives the solution toward co-optimal strategies without privileging either player. The authors formalize the algorithm, analyze conditions under which improvements are guaranteed (sharp/improving games), propose perturbations to ensure progress, and demonstrate the approach against strategy improvement through experiments. The work suggests a viable third paradigm for solving symmetric payoff games, with potential implications for tractability and broader applicability to parity and mean-payoff variants.
Abstract
While discounted payoff games and classic games that reduce to them, like parity and mean-payoff games, are symmetric, their solutions are not. We have taken a fresh view on the properties that optimal solutions need to have, and devised a novel way to converge to them, which is entirely symmetric. We achieve this by building a constraint system that uses every edge to define an inequation, and update the objective function by taking a single outgoing edge for each vertex into account. These edges loosely represent strategies of both players, where the objective function intuitively asks to make the inequation to these edges sharp. In fact, where they are not sharp, there is an `error' represented by the difference between the two sides of the inequation, which is 0 where the inequation is sharp. Hence, the objective is to minimise the sum of these errors. For co-optimal strategies, and only for them, it can be achieved that all selected inequations are sharp or, equivalently, that the sum of these errors is zero. While no co-optimal strategies have been found, we step-wise improve the error by improving the solution for a given objective function or by improving the objective function for a given solution. This also challenges the gospel that methods for solving payoff games are either based on strategy improvement or on value iteration.
