Table of Contents
Fetching ...

Smoothed analysis of deterministic discounted and mean-payoff games

Bruno Loff, Mateusz Skomra

TL;DR

This work studies the smoothed-analysis complexity of deterministic two-player discounted and mean-payoff games. It develops a policy-iteration framework based on bias-induced policies and the ergodic equation, and proves that, under Gaussian perturbations of payoffs (or more general distributions controlled by a condition number $\Delta$), the algorithm runs in time polynomial in the input size and the perturbation parameter with high probability. Central to the approach is the introduction of a mean-payoff condition number $\Delta(r)$ and the identification of Blackwell-optimal (bias-induced) policies that are unique with high probability, enabling a polynomial-time progression via increasing discount factors. The results contrast with known smoothed-counterexamples for stochastic cases and provide a rigorous, algorithmic path to efficient performance on random instances of deterministic two-player games, while also outlining open questions for non-ergodic graphs and broader model extensions.

Abstract

We devise a policy-iteration algorithm for deterministic two-player discounted and mean-payoff games, that runs in polynomial time with high probability, on any input where each payoff is chosen independently from a sufficiently random distribution. This includes the case where an arbitrary set of payoffs has been perturbed by a Gaussian, showing for the first time that deterministic two-player games can be solved efficiently, in the sense of smoothed analysis. More generally, we devise a condition number for deterministic discounted and mean-payoff games, and show that our algorithm runs in time polynomial in this condition number. Our result confirms a previous conjecture of Boros et al., which was claimed as a theorem and later retracted. It stands in contrast with a recent counter-example by Christ and Yannakakis, showing that Howard's policy-iteration algorithm does not run in smoothed polynomial time on stochastic single-player mean-payoff games. Our approach is inspired by the analysis of random optimal assignment instances by Frieze and Sorkin, and the analysis of bias-induced policies for mean-payoff games by Akian, Gaubert and Hochart.

Smoothed analysis of deterministic discounted and mean-payoff games

TL;DR

This work studies the smoothed-analysis complexity of deterministic two-player discounted and mean-payoff games. It develops a policy-iteration framework based on bias-induced policies and the ergodic equation, and proves that, under Gaussian perturbations of payoffs (or more general distributions controlled by a condition number ), the algorithm runs in time polynomial in the input size and the perturbation parameter with high probability. Central to the approach is the introduction of a mean-payoff condition number and the identification of Blackwell-optimal (bias-induced) policies that are unique with high probability, enabling a polynomial-time progression via increasing discount factors. The results contrast with known smoothed-counterexamples for stochastic cases and provide a rigorous, algorithmic path to efficient performance on random instances of deterministic two-player games, while also outlining open questions for non-ergodic graphs and broader model extensions.

Abstract

We devise a policy-iteration algorithm for deterministic two-player discounted and mean-payoff games, that runs in polynomial time with high probability, on any input where each payoff is chosen independently from a sufficiently random distribution. This includes the case where an arbitrary set of payoffs has been perturbed by a Gaussian, showing for the first time that deterministic two-player games can be solved efficiently, in the sense of smoothed analysis. More generally, we devise a condition number for deterministic discounted and mean-payoff games, and show that our algorithm runs in time polynomial in this condition number. Our result confirms a previous conjecture of Boros et al., which was claimed as a theorem and later retracted. It stands in contrast with a recent counter-example by Christ and Yannakakis, showing that Howard's policy-iteration algorithm does not run in smoothed polynomial time on stochastic single-player mean-payoff games. Our approach is inspired by the analysis of random optimal assignment instances by Frieze and Sorkin, and the analysis of bias-induced policies for mean-payoff games by Akian, Gaubert and Hochart.
Paper Structure (29 sections, 45 theorems, 73 equations, 15 figures)

This paper contains 29 sections, 45 theorems, 73 equations, 15 figures.

Key Result

Theorem 1.1

There exists a policy-iteration algorithm for solving $n$-state deterministic two-player (discounted or mean-payoff) games, which runs in time $\mathrm{poly}(\phi \cdot n)$ with high probability, on an input where normalized payoffs in $[-1,1]$ have been independently perturbed by a Gaussian with me

Figures (15)

  • Figure 1: A hierarchy of $\NP$ search problems. $\DetG$ and $\StG$ refer to deterministic, respectively stochastic, two-player games. Arrows denote inclusion or containment. By inclusion of a search problem in the classes $\UP \cap \coUP$ and $\NP \cap \coUP$ of decision problems, we mean that the problem of deciding each bit of the unique answer can be computed in these classes.
  • Figure 2: Example of a mean payoff game with non-convex value function. Nodes controlled by Min are depicted by circles and the node controlled by Max is depicted by a square. Edges without numbers have weights $0$.
  • Figure 3: Example of a mean payoff game in which the optimal policies change exponentially many times.
  • Figure 4: Example of a mean payoff game in which new optimal policies appear after a small perturbation of weights.
  • Figure 5: Example of a mean payoff game in which optimal policies are not stable under perturbations.
  • ...and 10 more figures

Theorems & Definitions (100)

  • Theorem 1.1: our main theorem
  • Theorem 1.2: generalization
  • Proposition 1.3: cf. generic_uniqueness
  • Definition 1.4
  • Lemma 1.5
  • Lemma 1.6
  • Theorem 1.7
  • Theorem 1.8
  • Theorem 1.9
  • Theorem 1.10
  • ...and 90 more