Smoothed analysis of deterministic discounted and mean-payoff games
Bruno Loff, Mateusz Skomra
TL;DR
This work studies the smoothed-analysis complexity of deterministic two-player discounted and mean-payoff games. It develops a policy-iteration framework based on bias-induced policies and the ergodic equation, and proves that, under Gaussian perturbations of payoffs (or more general distributions controlled by a condition number $\Delta$), the algorithm runs in time polynomial in the input size and the perturbation parameter with high probability. Central to the approach is the introduction of a mean-payoff condition number $\Delta(r)$ and the identification of Blackwell-optimal (bias-induced) policies that are unique with high probability, enabling a polynomial-time progression via increasing discount factors. The results contrast with known smoothed-counterexamples for stochastic cases and provide a rigorous, algorithmic path to efficient performance on random instances of deterministic two-player games, while also outlining open questions for non-ergodic graphs and broader model extensions.
Abstract
We devise a policy-iteration algorithm for deterministic two-player discounted and mean-payoff games, that runs in polynomial time with high probability, on any input where each payoff is chosen independently from a sufficiently random distribution. This includes the case where an arbitrary set of payoffs has been perturbed by a Gaussian, showing for the first time that deterministic two-player games can be solved efficiently, in the sense of smoothed analysis. More generally, we devise a condition number for deterministic discounted and mean-payoff games, and show that our algorithm runs in time polynomial in this condition number. Our result confirms a previous conjecture of Boros et al., which was claimed as a theorem and later retracted. It stands in contrast with a recent counter-example by Christ and Yannakakis, showing that Howard's policy-iteration algorithm does not run in smoothed polynomial time on stochastic single-player mean-payoff games. Our approach is inspired by the analysis of random optimal assignment instances by Frieze and Sorkin, and the analysis of bias-induced policies for mean-payoff games by Akian, Gaubert and Hochart.
