Learning the Latent Rules of a Game from Data: A Chess Story

Ben Fauber

Learning the Latent Rules of a Game from Data: A Chess Story

Ben Fauber

TL;DR

It is shown that 28M and 125M parameter pretrained foundational small language models can be instruction fine-tuned with 1,000-to-1,000,000 examples to learn the rules of chess, propose legal moves, and accurately solve chess problems.

Abstract

We demonstrate that small pretrained foundational generative language models with millions of parameters can learn the latent rules of a process from data associated with the process. Inspired by Stefan Zweig's novella "Schachnovelle," also known as "The Royal Game" in English, we show that 28M and 125M parameter pretrained foundational small language models (SLMs) can be instruction fine-tuned with 1,000-to-1,000,000 examples to learn the rules of chess, propose legal moves, and accurately solve chess problems. We also explore the impact of successive language model fine-tuning epochs on improved outcomes and demonstrate reductions in model hallucinations by increasing the number of instruction fine-tuning examples.

Learning the Latent Rules of a Game from Data: A Chess Story

TL;DR

Abstract

Paper Structure (26 sections, 12 figures, 4 tables)

This paper contains 26 sections, 12 figures, 4 tables.

Introduction
Background
Our Contribution
Methods
Standard Algebraic Chess Notation
Forsyth-Edwards Board Notation
Dataset
Data Sampling
Pretrained Foundational Small Language Models
Evaluation of Our Method
Results
Baseline Performance
Increasing Dataset Size Improves Performance
Chess Game Play Improves with More Data
Hallucinations Decrease with More Instruction Fine-Tuning Examples
...and 11 more sections

Figures (12)

Figure 1: Illustration of our proposed task: prediction of a white chess piece move (red arrow) given a board state in standard algebraic chess notation (SAN). The chessboard has standard algebraic notation ranks and files along the board edges.
Figure 2: Illustration of the initial array (left board diagram) and the standard algebraic chess notation (SAN) for the "Queen's Gambit Declined," a popular opening move sequence in the 1920's and 1930's (Encyclopedia of Chess Openings, sequences D30-42). The moves, in SAN, are as follows: 1. d4 d5, 2. c4 e6. The outcome of the two move sequences is shown on the right board diagram.
Figure 3: Influence of increasing instruction fine-tuning examples. Percentage of legal proposed moves versus count of the instruction fine-tuning examples for the TinyStories-28M (blue) and OPT-125M (orange) language models, instruction fine-tuned with learning rate = 2e-4, batch size = 4, and epochs = 3. The performance of the instruction fine-tuned language models was evaluated using 10,000 test instances of chess board states drawn from WSM-10M to assess the model's ability to generate a legal proposed move.
Figure 4: Percentage of legal proposed moves versus count of the instruction fine-tuning examples for the TinyStories-28M (blue) and OPT-125M (orange) language models, instruction fine-tuned with learning rate = 2e-4, batch size = 4, and epochs = 3. The performance of each instruction fine-tuned language model was evaluated using 1,000 test instances of chess problems drawn from Check/Mate-in-1 to assess the model's ability to generate a legal proposed move.
Figure 5: Percentage of proposed moves which were legal and resulted in check or checkmate versus count of the instruction fine-tuning examples for the TinyStories-28M (blue) and OPT-125M (orange) language models, instruction fine-tuned with learning rate = 2e-4, batch size = 4, and epochs = 3. The performance of each instruction fine-tuned language model was evaluated using 1,000 test instances of chess problems drawn from Check/Mate-in-1 to assess the model's ability to generate a legal move that resulted in check or checkmate.
...and 7 more figures

Learning the Latent Rules of a Game from Data: A Chess Story

TL;DR

Abstract

Learning the Latent Rules of a Game from Data: A Chess Story

Authors

TL;DR

Abstract

Table of Contents

Figures (12)