Learning to Play Two-Player Perfect-Information Games without Knowledge
Quentin Cohen-Solal
TL;DR
This work develops Athénan, a zero-knowledge reinforcement-learning framework for two-player perfect-information games that combines non-linear tree learning, a deep-search variant called Descent, completion-based state resolution, and reinforcement heuristics with a novel ordinal action distribution. By unifying these components, Athénan learns game-state evaluations through self-play and achieves state-of-the-art results on multiple domains, notably surpassing Mohex 3HNN in Hex on 11×11 and 13×13 boards, Edax in Othello, and Sharp in Arimaa, all without predefined domain knowledge. The approach also demonstrates strong single-player performance in Morpion Solitaire and competitive Computer Olympiad results, indicating broad applicability to general game-playing tasks. Key contributions include tree learning generalized to non-linear evaluators, a data-generating search (Descent) optimized for learning data quality, a completion framework to leverage resolved states, and reinforcement heuristics that substantially boost learning efficiency. Collectively, the findings suggest that zero-knowledge reinforcement learning within a minimax-inspired framework can rival or exceed knowledge-based or MCTS-based systems across a diverse set of games, with practical implications for General Game Playing and scalable AI.
Abstract
In this paper, several techniques for learning game state evaluation functions by reinforcement are proposed. The first is a generalization of tree bootstrapping (tree learning): it is adapted to the context of reinforcement learning without knowledge based on non-linear functions. With this technique, no information is lost during the reinforcement learning process. The second is a modification of minimax with unbounded depth extending the best sequences of actions to the terminal states. This modified search is intended to be used during the learning process. The third is to replace the classic gain of a game (+1 / -1) with a reinforcement heuristic. We study particular reinforcement heuristics such as: quick wins and slow defeats ; scoring ; mobility or presence. The four is a new action selection distribution. The conducted experiments suggest that these techniques improve the level of play. Finally, we apply these different techniques to design program-players to the game of Hex (size 11 and 13) surpassing the level of Mohex 3HNN with reinforcement learning from self-play without knowledge.
