Concentration Tail-Bound Analysis of Coevolutionary and Bandit Learning Algorithms
Per Kristian Lehre, Shishen Lin
TL;DR
This work develops a novel recurrence-based drift theorem that yields exponential tail bounds for first hitting times under a broad range of drift regimes, including positive, weak, zero, and negative drift, by leveraging variance properties and the extended Optional Stopping Theorem. The framework is then applied to diverse algorithms, producing strong high-probability guarantees: (i) RWAB regret concentrates in non-stationary two-armed bandits, and (ii) RLS-PD finds Nash equilibria in Bilinear maximin benchmarks with an $O(n^{1.5})$ runtime that concentrates, while also exhibiting NE forgetting w.h.p. The authors also demonstrate tail bounds for classical problems like Random 2-SAT and Graph Colouring, providing polynomial-time tails $O(n^{4})$. Empirical studies corroborate the theory, showing exponentially decaying tails for runtimes and regrets, and highlighting practical implications for algorithm reliability and stability. Overall, the paper offers a general toolkit for sharp runtime and regret concentration in stochastic algorithms via drift recurrences and optional stopping, with clear avenues for future work on stabilizing coevolutionary dynamics and refining bandit strategies.
Abstract
Runtime analysis, as a branch of the theory of AI, studies how the number of iterations algorithms take before finding a solution (its runtime) depends on the design of the algorithm and the problem structure. Drift analysis is a state-of-the-art tool for estimating the runtime of randomised algorithms, such as evolutionary and bandit algorithms. Drift refers roughly to the expected progress towards the optimum per iteration. This paper considers the problem of deriving concentration tail-bounds on the runtime/regret of algorithms. It provides a novel drift theorem that gives precise exponential tail-bounds given positive, weak, zero and even negative drift. Previously, such exponential tail bounds were missing in the case of weak, zero, or negative drift. Our drift theorem can be used to prove a strong concentration of the runtime/regret of algorithms in AI. For example, we prove that the regret of the \rwab bandit algorithm is highly concentrated, while previous analyses only considered the expected regret. This means that the algorithm obtains the optimum within a given time frame with high probability, i.e. a form of algorithm reliability. Moreover, our theorem implies that the time needed by the co-evolutionary algorithm RLS-PD to obtain a Nash equilibrium in a \bilinear max-min-benchmark problem is highly concentrated. However, we also prove that the algorithm forgets the Nash equilibrium, and the time until this occurs is highly concentrated. This highlights a weakness in the RLS-PD which should be addressed by future work.
