Mastering NIM and Impartial Games with Weak Neural Networks: An AlphaZero-inspired Multi-Frame Approach
Søren Riis
TL;DR
This work introduces a circuit-precision framework to analyze fixed-precision neural inference in impartial games, specifically NIM, showing that $AC^0$ circuits can simulate such networks and thus parity-like computations are hard to learn in a strong sense. It proves a negative result for single-frame agents, but demonstrates a positive, practically effective fix: a two-frame history exposes locally computable nimber differences, enabling a restoration rule that maintains nim-sum zero without recomputing the global nim-sum. Empirically, two-frame policies reliably learn restoration in a 20-heap, 4-bit Nim setting, achieving near-perfect restoration and strong performance gains when combined with MCTS, while one-frame models lag behind. The paper further argues that the feasible learning regime aligns with approximate majority (capturable by $AC^0$) and distinguishes it from exact parity, offering a general blueprint for turning global parity constraints into local invariants in impartial games and related domains.
Abstract
We introduce a practical circuit-complexity model for fixed-precision neural networks to explain and overcome a persistent learnability barrier in impartial games like NIM. We show that bounded-depth, polynomial-size, fixed-precision neural inference, including recurrent and attention-style architectures, is simulable by AC0 circuits. This places them below TC0 and explains their inability to compute exact parity or the nim-sum. On the negative side, we prove that single-frame AlphaZero-style agents with AC0-constrained networks cannot achieve strong mastery of NIM, even with polynomial-time search, as they cannot represent global parity. On the positive side, we show that augmenting the state with two-frame history exposes locally computable nimber differences that are AC0-computable. This enables a local restoration rule: after an opponent move, one can restore the zero nim-sum invariant by matching the observed difference without recomputing the global nim-sum from scratch. Empirically, our two-frame policy achieves near-perfect restoration accuracy in 20-heap NIM, whereas a one-frame baseline stays near chance. Finally, we justify AC0 as a model for feasible learnability. We distinguish between approximate majority, which is compatible with AC0 and learnable in practice, and the sharp majority required for parity, which is infeasible under fixed precision and noise.
