Table of Contents
Fetching ...

Enhancing Chess Reinforcement Learning with Graph Representation

Tomas Rigaux, Hisashi Kashima

TL;DR

This paper focuses on Chess, and explores using a more generic Graph-based Representation of a game state, rather than a grid-based one, to introduce a more general architecture based on Graph Neural Networks (GNN).

Abstract

Mastering games is a hard task, as games can be extremely complex, and still fundamentally different in structure from one another. While the AlphaZero algorithm has demonstrated an impressive ability to learn the rules and strategy of a large variety of games, ranging from Go and Chess, to Atari games, its reliance on extensive computational resources and rigid Convolutional Neural Network (CNN) architecture limits its adaptability and scalability. A model trained to play on a $19\times 19$ Go board cannot be used to play on a smaller $13\times 13$ board, despite the similarity between the two Go variants. In this paper, we focus on Chess, and explore using a more generic Graph-based Representation of a game state, rather than a grid-based one, to introduce a more general architecture based on Graph Neural Networks (GNN). We also expand the classical Graph Attention Network (GAT) layer to incorporate edge-features, to naturally provide a generic policy output format. Our experiments, performed on smaller networks than the initial AlphaZero paper, show that this new architecture outperforms previous architectures with a similar number of parameters, being able to increase playing strength an order of magnitude faster. We also show that the model, when trained on a smaller $5\times 5$ variant of chess, is able to be quickly fine-tuned to play on regular $8\times 8$ chess, suggesting that this approach yields promising generalization abilities. Our code is available at https://github.com/akulen/AlphaGateau.

Enhancing Chess Reinforcement Learning with Graph Representation

TL;DR

This paper focuses on Chess, and explores using a more generic Graph-based Representation of a game state, rather than a grid-based one, to introduce a more general architecture based on Graph Neural Networks (GNN).

Abstract

Mastering games is a hard task, as games can be extremely complex, and still fundamentally different in structure from one another. While the AlphaZero algorithm has demonstrated an impressive ability to learn the rules and strategy of a large variety of games, ranging from Go and Chess, to Atari games, its reliance on extensive computational resources and rigid Convolutional Neural Network (CNN) architecture limits its adaptability and scalability. A model trained to play on a Go board cannot be used to play on a smaller board, despite the similarity between the two Go variants. In this paper, we focus on Chess, and explore using a more generic Graph-based Representation of a game state, rather than a grid-based one, to introduce a more general architecture based on Graph Neural Networks (GNN). We also expand the classical Graph Attention Network (GAT) layer to incorporate edge-features, to naturally provide a generic policy output format. Our experiments, performed on smaller networks than the initial AlphaZero paper, show that this new architecture outperforms previous architectures with a similar number of parameters, being able to increase playing strength an order of magnitude faster. We also show that the model, when trained on a smaller variant of chess, is able to be quickly fine-tuned to play on regular chess, suggesting that this approach yields promising generalization abilities. Our code is available at https://github.com/akulen/AlphaGateau.

Paper Structure

This paper contains 21 sections, 14 equations, 16 figures, 1 table, 2 algorithms.

Figures (16)

  • Figure 1: The starting positions of $8\times 8$ and $5\times 5$ chess games
  • Figure 2: The AlphaGateau network, $hs$ is the inner size of the feature vectors, and $L$ is the number of residual blocks.
  • Figure 3: Value head
  • Figure 5: The Elo ratings of AlphaZero and AlphaGateau with 5 residual layers trained over 500 iterations. The AlphaGateau model initially learns '1̃0 times faster than the AlphaZero model, and settles after 100 iterations to a comparable speed of growth to that of AlphaZero.
  • Figure 6: The Elo ratings of the first 100 iterations of the AlphaGateau model from Figure \ref{['fig:exp1']} was included for comparison. The initial training on $5\times 5$ chess is able to increase its rating while evaluated on $8\times 8$ chess during training, even without seeing any $8\times 8$ chess position. The fine-tuned model starts with a good baseline, and reaches comparable performances to the 5-layer model despite being undertrained for its size.
  • ...and 11 more figures