Table of Contents
Fetching ...

Mastering NIM and Impartial Games with Weak Neural Networks: An AlphaZero-inspired Multi-Frame Approach

Søren Riis

TL;DR

This work introduces a circuit-precision framework to analyze fixed-precision neural inference in impartial games, specifically NIM, showing that $AC^0$ circuits can simulate such networks and thus parity-like computations are hard to learn in a strong sense. It proves a negative result for single-frame agents, but demonstrates a positive, practically effective fix: a two-frame history exposes locally computable nimber differences, enabling a restoration rule that maintains nim-sum zero without recomputing the global nim-sum. Empirically, two-frame policies reliably learn restoration in a 20-heap, 4-bit Nim setting, achieving near-perfect restoration and strong performance gains when combined with MCTS, while one-frame models lag behind. The paper further argues that the feasible learning regime aligns with approximate majority (capturable by $AC^0$) and distinguishes it from exact parity, offering a general blueprint for turning global parity constraints into local invariants in impartial games and related domains.

Abstract

We introduce a practical circuit-complexity model for fixed-precision neural networks to explain and overcome a persistent learnability barrier in impartial games like NIM. We show that bounded-depth, polynomial-size, fixed-precision neural inference, including recurrent and attention-style architectures, is simulable by AC0 circuits. This places them below TC0 and explains their inability to compute exact parity or the nim-sum. On the negative side, we prove that single-frame AlphaZero-style agents with AC0-constrained networks cannot achieve strong mastery of NIM, even with polynomial-time search, as they cannot represent global parity. On the positive side, we show that augmenting the state with two-frame history exposes locally computable nimber differences that are AC0-computable. This enables a local restoration rule: after an opponent move, one can restore the zero nim-sum invariant by matching the observed difference without recomputing the global nim-sum from scratch. Empirically, our two-frame policy achieves near-perfect restoration accuracy in 20-heap NIM, whereas a one-frame baseline stays near chance. Finally, we justify AC0 as a model for feasible learnability. We distinguish between approximate majority, which is compatible with AC0 and learnable in practice, and the sharp majority required for parity, which is infeasible under fixed precision and noise.

Mastering NIM and Impartial Games with Weak Neural Networks: An AlphaZero-inspired Multi-Frame Approach

TL;DR

This work introduces a circuit-precision framework to analyze fixed-precision neural inference in impartial games, specifically NIM, showing that circuits can simulate such networks and thus parity-like computations are hard to learn in a strong sense. It proves a negative result for single-frame agents, but demonstrates a positive, practically effective fix: a two-frame history exposes locally computable nimber differences, enabling a restoration rule that maintains nim-sum zero without recomputing the global nim-sum. Empirically, two-frame policies reliably learn restoration in a 20-heap, 4-bit Nim setting, achieving near-perfect restoration and strong performance gains when combined with MCTS, while one-frame models lag behind. The paper further argues that the feasible learning regime aligns with approximate majority (capturable by ) and distinguishes it from exact parity, offering a general blueprint for turning global parity constraints into local invariants in impartial games and related domains.

Abstract

We introduce a practical circuit-complexity model for fixed-precision neural networks to explain and overcome a persistent learnability barrier in impartial games like NIM. We show that bounded-depth, polynomial-size, fixed-precision neural inference, including recurrent and attention-style architectures, is simulable by AC0 circuits. This places them below TC0 and explains their inability to compute exact parity or the nim-sum. On the negative side, we prove that single-frame AlphaZero-style agents with AC0-constrained networks cannot achieve strong mastery of NIM, even with polynomial-time search, as they cannot represent global parity. On the positive side, we show that augmenting the state with two-frame history exposes locally computable nimber differences that are AC0-computable. This enables a local restoration rule: after an opponent move, one can restore the zero nim-sum invariant by matching the observed difference without recomputing the global nim-sum from scratch. Empirically, our two-frame policy achieves near-perfect restoration accuracy in 20-heap NIM, whereas a one-frame baseline stays near chance. Finally, we justify AC0 as a model for feasible learnability. We distinguish between approximate majority, which is compatible with AC0 and learnable in practice, and the sharp majority required for parity, which is infeasible under fixed precision and noise.

Paper Structure

This paper contains 68 sections, 8 theorems, 14 equations, 3 figures, 1 table.

Key Result

Theorem 2.2

Fix constants $W,D\in\mathbb{N}$ and a time window $T\in\mathbb{N}$. Let $\{\mathcal{M}_n\}_{n\ge 1}$ be any family of models of type $\mathsf{NN}$, $\mathsf{RNN}$ (with window $T$), or $\mathsf{LTST}$ (with window $T$) such that: Then the Boolean function computed by $\mathcal{M}_n$ can be computed by an $\mathrm{AC}^0$ circuit family of polynomial size and constant depth (depending only on $L,T

Figures (3)

  • Figure 1: Validation restoration accuracy vs. epoch on the restoration dataset ($N=20$, $k=4$, $D=10^6$). The 1F model stays near chance, while the 2F model exhibits a characteristic two-stage learning curve with an intermediate plateau around 0.54 before reaching near-perfect accuracy.
  • Figure 2: Test restoration accuracy of checkpointed agents ($N=20$, $k=4$, $D=10^6$). Checkpoints are saved at epochs $\{1,23,45,67,89,112,134,156,178,200\}$.
  • Figure 3: Head-to-head Elo of checkpointed agents under MCTS-per-move evaluation ($N=20$, $k=4$). Each move is selected by PUCT with 200 simulations and rollouts using greedy policy play from the two checkpoints.

Theorems & Definitions (28)

  • Definition 2.1: Neural networks with constant precision (informal)
  • Theorem 2.2: Simulation of fixed-precision neural models by $\mathrm{AC}^0$
  • Definition 3.1: Impartial game
  • Definition 3.2: NIM
  • Definition 3.3: Nim-sum
  • Theorem 3.4: Classical NIM characterisation
  • Definition 4.1: Strong vs. weak mastery
  • Theorem 4.2: Single-frame strong mastery requires parity; impossible in $\mathrm{AC}^0$
  • proof : Proof sketch
  • Theorem 4.3: Polynomial-time bounded search does not fix single-frame $\mathrm{AC}^0$ limitations
  • ...and 18 more