Smoothed analysis of deterministic discounted and mean-payoff games

Bruno Loff; Mateusz Skomra

Smoothed analysis of deterministic discounted and mean-payoff games

Bruno Loff, Mateusz Skomra

TL;DR

This work studies the smoothed-analysis complexity of deterministic two-player discounted and mean-payoff games. It develops a policy-iteration framework based on bias-induced policies and the ergodic equation, and proves that, under Gaussian perturbations of payoffs (or more general distributions controlled by a condition number $\Delta$), the algorithm runs in time polynomial in the input size and the perturbation parameter with high probability. Central to the approach is the introduction of a mean-payoff condition number $\Delta(r)$ and the identification of Blackwell-optimal (bias-induced) policies that are unique with high probability, enabling a polynomial-time progression via increasing discount factors. The results contrast with known smoothed-counterexamples for stochastic cases and provide a rigorous, algorithmic path to efficient performance on random instances of deterministic two-player games, while also outlining open questions for non-ergodic graphs and broader model extensions.

Abstract

We devise a policy-iteration algorithm for deterministic two-player discounted and mean-payoff games, that runs in polynomial time with high probability, on any input where each payoff is chosen independently from a sufficiently random distribution. This includes the case where an arbitrary set of payoffs has been perturbed by a Gaussian, showing for the first time that deterministic two-player games can be solved efficiently, in the sense of smoothed analysis. More generally, we devise a condition number for deterministic discounted and mean-payoff games, and show that our algorithm runs in time polynomial in this condition number. Our result confirms a previous conjecture of Boros et al., which was claimed as a theorem and later retracted. It stands in contrast with a recent counter-example by Christ and Yannakakis, showing that Howard's policy-iteration algorithm does not run in smoothed polynomial time on stochastic single-player mean-payoff games. Our approach is inspired by the analysis of random optimal assignment instances by Frieze and Sorkin, and the analysis of bias-induced policies for mean-payoff games by Akian, Gaubert and Hochart.

Smoothed analysis of deterministic discounted and mean-payoff games

TL;DR

), the algorithm runs in time polynomial in the input size and the perturbation parameter with high probability. Central to the approach is the introduction of a mean-payoff condition number

and the identification of Blackwell-optimal (bias-induced) policies that are unique with high probability, enabling a polynomial-time progression via increasing discount factors. The results contrast with known smoothed-counterexamples for stochastic cases and provide a rigorous, algorithmic path to efficient performance on random instances of deterministic two-player games, while also outlining open questions for non-ergodic graphs and broader model extensions.

Abstract

Paper Structure (29 sections, 45 theorems, 73 equations, 15 figures)

This paper contains 29 sections, 45 theorems, 73 equations, 15 figures.

Introduction
A history of discounted and mean-payoff games
Algorithms
Policy iteration versus the simplex method
Smoothed analysis
Computational complexity
A previous approach and our approach
Related work
Outline of the paper
Technical summary
Toy example: deterministic Markov Decision Processes
Deterministic mean-payoff games
Definition of mean-payoff games
A two-player mean-payoff game with exponential number of breakpoints
Further differences between one-player and two-player games
...and 14 more sections

Key Result

Theorem 1.1

There exists a policy-iteration algorithm for solving $n$-state deterministic two-player (discounted or mean-payoff) games, which runs in time $\mathrm{poly}(\phi \cdot n)$ with high probability, on an input where normalized payoffs in $[-1,1]$ have been independently perturbed by a Gaussian with me

Figures (15)

Figure 1: A hierarchy of $\NP$ search problems. $\DetG$ and $\StG$ refer to deterministic, respectively stochastic, two-player games. Arrows denote inclusion or containment. By inclusion of a search problem in the classes $\UP \cap \coUP$ and $\NP \cap \coUP$ of decision problems, we mean that the problem of deciding each bit of the unique answer can be computed in these classes.
Figure 2: Example of a mean payoff game with non-convex value function. Nodes controlled by Min are depicted by circles and the node controlled by Max is depicted by a square. Edges without numbers have weights $0$.
Figure 3: Example of a mean payoff game in which the optimal policies change exponentially many times.
Figure 4: Example of a mean payoff game in which new optimal policies appear after a small perturbation of weights.
Figure 5: Example of a mean payoff game in which optimal policies are not stable under perturbations.
...and 10 more figures

Theorems & Definitions (100)

Theorem 1.1: our main theorem
Theorem 1.2: generalization
Proposition 1.3: cf. generic_uniqueness
Definition 1.4
Lemma 1.5
Lemma 1.6
Theorem 1.7
Theorem 1.8
Theorem 1.9
Theorem 1.10
...and 90 more

Smoothed analysis of deterministic discounted and mean-payoff games

TL;DR

Abstract

Smoothed analysis of deterministic discounted and mean-payoff games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (100)