Table of Contents
Fetching ...

A General Upper Bound for the Runtime of a Coevolutionary Algorithm on Impartial Combinatorial Games

Alistair Benford, Per Kristian Lehre

TL;DR

This work provides the first runtime analysis of a coevolutionary estimation of distribution algorithm (UMDA) on impartial combinatorial games, introducing a game-graph invariant called switchability to characterize learning difficulty. The authors prove a general upper bound: with a suitably large population size μ, the algorithm finds an optimal strategy within a time of order $O(μ r(G))$ with high probability, where $r(G)$ depends on the number of positions $n$, maximum degree $Δ$, and per-vertex switchability $s(v)$ on critical positions $W_G$. A corollary gives a simpler bound in terms of the maximum switchability, enabling polynomial or quasipolynomial runtimes for many games. The paper validates the theory by applying it to Subtraction Nim, Silver Dollar, Turning Turtles, and Chomp, illustrating how the framework supports a broad, theory-driven understanding of CoEA behavior in turn-based impartial games and guiding future runtime analyses in more complex settings.

Abstract

Due to their complex dynamics, combinatorial games are a key test case and application for algorithms that train game playing agents. Among those algorithms that train using self-play are coevolutionary algorithms (CoEAs). However, the successful application of CoEAs for game playing is difficult due to pathological behaviours such as cycling, an issue especially critical for games with intransitive payoff landscapes. Insight into how to design CoEAs to avoid such behaviours can be provided by runtime analysis. In this paper, we push the scope of runtime analysis for CoEAs to combinatorial games, proving a general upper bound for the number of simulated games needed for UMDA to discover (with high probability) an optimal strategy. This result applies to any impartial combinatorial game, and for many games the implied bound is polynomial or quasipolynomial as a function of the number of game positions. After proving the main result, we provide several applications to simple well-known games: Nim, Chomp, Silver Dollar, and Turning Turtles. As the first runtime analysis for CoEAs on combinatorial games, this result is a critical step towards a comprehensive theoretical framework for coevolution.

A General Upper Bound for the Runtime of a Coevolutionary Algorithm on Impartial Combinatorial Games

TL;DR

This work provides the first runtime analysis of a coevolutionary estimation of distribution algorithm (UMDA) on impartial combinatorial games, introducing a game-graph invariant called switchability to characterize learning difficulty. The authors prove a general upper bound: with a suitably large population size μ, the algorithm finds an optimal strategy within a time of order with high probability, where depends on the number of positions , maximum degree , and per-vertex switchability on critical positions . A corollary gives a simpler bound in terms of the maximum switchability, enabling polynomial or quasipolynomial runtimes for many games. The paper validates the theory by applying it to Subtraction Nim, Silver Dollar, Turning Turtles, and Chomp, illustrating how the framework supports a broad, theory-driven understanding of CoEA behavior in turn-based impartial games and guiding future runtime analyses in more complex settings.

Abstract

Due to their complex dynamics, combinatorial games are a key test case and application for algorithms that train game playing agents. Among those algorithms that train using self-play are coevolutionary algorithms (CoEAs). However, the successful application of CoEAs for game playing is difficult due to pathological behaviours such as cycling, an issue especially critical for games with intransitive payoff landscapes. Insight into how to design CoEAs to avoid such behaviours can be provided by runtime analysis. In this paper, we push the scope of runtime analysis for CoEAs to combinatorial games, proving a general upper bound for the number of simulated games needed for UMDA to discover (with high probability) an optimal strategy. This result applies to any impartial combinatorial game, and for many games the implied bound is polynomial or quasipolynomial as a function of the number of game positions. After proving the main result, we provide several applications to simple well-known games: Nim, Chomp, Silver Dollar, and Turning Turtles. As the first runtime analysis for CoEAs on combinatorial games, this result is a critical step towards a comprehensive theoretical framework for coevolution.
Paper Structure (15 sections, 16 theorems, 90 equations, 4 figures, 1 algorithm)

This paper contains 15 sections, 16 theorems, 90 equations, 4 figures, 1 algorithm.

Key Result

Theorem 1.1

Let $\mathcal{A}$ be the coevolutionary algorithm specified in Section sect:umda, and let $G$ be an impartial combinatorial game with $n$ possible positions. Then, with high probability, $\mathcal{A}$ discovers an optimal strategy for $G$ within $n^{O(\overline{s})}$ game evaluations, where $\overli

Figures (4)

  • Figure 1: In the combinatorial game illustrated above, Sprague-Grundy values at each game position are shown in red. In this game, $W_G=\{v_0,b\}$. However, any strategy $x$ with $x(v_0)=d$ is automatically optimal (the first player wins on their first turn), and so the condition of Lemma \ref{['lm:optimality-characterisation']} is not a necessary one.
  • Figure 2: An example of switchability.
  • Figure 3: Two illustrations of switchability. In the first, $s(v)=1$, and a $v$-switcher of depth $1$ is shown in blue. In the second, $s(v)=2$, a $v$-switcher of depth $2$ is shown in blue, and one example of an $A$-compatible path is shown in red.
  • Figure 4: A game that should be easy to optimise, but contains vertices with switchability $\Theta(n)$.

Theorems & Definitions (39)

  • Theorem 1.1: Corollary \ref{['cor:impartial-games']}, informal version
  • Definition 2.1
  • Lemma 2.2
  • proof
  • Lemma 3.1
  • proof
  • Lemma 3.2
  • proof : Proof of Lemma \ref{['lm:sampled-individual']}
  • Proposition 3.3
  • proof
  • ...and 29 more