Representation Matters for Mastering Chess: Improved Feature Representation in AlphaZero Outperforms Switching to Transformers

Johannes Czech; Jannis Blüml; Kristian Kersting; Hedinn Steingrimsson

Representation Matters for Mastering Chess: Improved Feature Representation in AlphaZero Outperforms Switching to Transformers

Johannes Czech, Jannis Blüml, Kristian Kersting, Hedinn Steingrimsson

TL;DR

A practical improvement that involves a simple change in the input representation and value loss functions is proposed, achieving a significant performance boost of up to 180 Elo points beyond what is currently achievable with AlphaZero in chess.

Abstract

While transformers have gained recognition as a versatile tool for artificial intelligence (AI), an unexplored challenge arises in the context of chess - a classical AI benchmark. Here, incorporating Vision Transformers (ViTs) into AlphaZero is insufficient for chess mastery, mainly due to ViTs' computational limitations. The attempt to optimize their efficiency by combining MobileNet and NextViT outperformed AlphaZero by about 30 Elo. However, we propose a practical improvement that involves a simple change in the input representation and value loss functions. As a result, we achieve a significant performance boost of up to 180 Elo points beyond what is currently achievable with AlphaZero in chess. In addition to these improvements, our experimental results using the Integrated Gradient technique confirm the effectiveness of the newly introduced features.

Representation Matters for Mastering Chess: Improved Feature Representation in AlphaZero Outperforms Switching to Transformers

TL;DR

Abstract

Paper Structure (17 sections, 6 equations, 8 figures, 12 tables)

This paper contains 17 sections, 6 equations, 8 figures, 12 tables.

Introduction
AlphaVile: Integrating transformers into AlphaZero
AlphaVile-FX: The importance of representation
Expanding the input representation
Redefining the value loss representation
Investigating the significance of representation in chess mastery
Trade-off between efficiency and accuracy
Comparative assessment of playing strength
Interpretability of FX-features
Related Work
Conclusion
Supplementary Materials
Final Performance Overview of AlphaVile
Preliminary Experiments for Building AlphaVile
Optimizing Scaling Ratios for Network
...and 2 more sections

Figures (8)

Figure 1: Architectural Overview of the Predictor Network in AlphaVile. The Mobile Convolutional Block (MCB) is inspired by Sandler et al.'s work sandler2018mobilenetv2, while the Next Transformer Block (NTB) is integrated from Li et al.'s research li2022next. The parameter $B$ denotes the number of hybrid blocks within the architecture, offering scalability to the model. Our standard AlphaVile model employs ten MCBs in Stage 1 ($N_1 = 10$) and two Stage 2 Blocks ($B=2$). Each Stage 2 Block consists of seven MCBs ($N_2 = 7$) and one NTB.
Figure 2: Comparing Architectural Components of Convolution-Based Blocks. This diagram utilises "DW" to denote Depthwise Convolution. Batchnorm and ReLU layers have been omitted for clarity. The conventional residual block, initiated by He et al. he2016deep, is substituted in AlphaVile with the mobile convolution block, found in MobileNet, as stated by Sandler et al. sandler2018mobilenetv2. Additionally, we make use of the next convolution block originally introduced in NextViT by Li et al. li2022next
Figure 3: A comparison between AlphaVile and other efficient neural network architectures, with a focus on achieving an optimal balance between accuracy and latency. The results were obtained from three independent seed runs.
Figure 4: The AlphaZero-FX network showcases excellent performance in chess (\ref{['fig:strength_comparision_chess']}), crazyhouse (\ref{['fig:strength_comparision_crazyhouse']}), and atomic chess (\ref{['fig:strength_comparision_atomic']}), surpassing the vanilla version using Input Representation Version 1 without the WDLP head. The performance increase in chess is noteworthy, with an increase of 180 Elo point. The performance level of the AlphaVile network is comparable to that of the AlphaZero network, especially at longer move times.
Figure 5: The newly introduced FX-features demonstrate significant usage, highlighted by the Integrated Gradients (IG) method for feature importance analysis. In the conventional input representation (\ref{['fig:feature_importance']}a), both positive and negative feature attributions are predominantly related to piece maps. In the enhanced input representation presented in (\ref{['fig:feature_importance']}b), supplementary features are incorporated, while two features marked with strike-through are omitted. The IG method uses the average of all inputs as a baseline for the attribution calculation.
...and 3 more figures

Representation Matters for Mastering Chess: Improved Feature Representation in AlphaZero Outperforms Switching to Transformers

TL;DR

Abstract

Representation Matters for Mastering Chess: Improved Feature Representation in AlphaZero Outperforms Switching to Transformers

Authors

TL;DR

Abstract

Table of Contents

Figures (8)