Table of Contents
Fetching ...

Learning the Latent Rules of a Game from Data: A Chess Story

Ben Fauber

TL;DR

It is shown that 28M and 125M parameter pretrained foundational small language models can be instruction fine-tuned with 1,000-to-1,000,000 examples to learn the rules of chess, propose legal moves, and accurately solve chess problems.

Abstract

We demonstrate that small pretrained foundational generative language models with millions of parameters can learn the latent rules of a process from data associated with the process. Inspired by Stefan Zweig's novella "Schachnovelle," also known as "The Royal Game" in English, we show that 28M and 125M parameter pretrained foundational small language models (SLMs) can be instruction fine-tuned with 1,000-to-1,000,000 examples to learn the rules of chess, propose legal moves, and accurately solve chess problems. We also explore the impact of successive language model fine-tuning epochs on improved outcomes and demonstrate reductions in model hallucinations by increasing the number of instruction fine-tuning examples.

Learning the Latent Rules of a Game from Data: A Chess Story

TL;DR

It is shown that 28M and 125M parameter pretrained foundational small language models can be instruction fine-tuned with 1,000-to-1,000,000 examples to learn the rules of chess, propose legal moves, and accurately solve chess problems.

Abstract

We demonstrate that small pretrained foundational generative language models with millions of parameters can learn the latent rules of a process from data associated with the process. Inspired by Stefan Zweig's novella "Schachnovelle," also known as "The Royal Game" in English, we show that 28M and 125M parameter pretrained foundational small language models (SLMs) can be instruction fine-tuned with 1,000-to-1,000,000 examples to learn the rules of chess, propose legal moves, and accurately solve chess problems. We also explore the impact of successive language model fine-tuning epochs on improved outcomes and demonstrate reductions in model hallucinations by increasing the number of instruction fine-tuning examples.
Paper Structure (26 sections, 12 figures, 4 tables)

This paper contains 26 sections, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Illustration of our proposed task: prediction of a white chess piece move (red arrow) given a board state in standard algebraic chess notation (SAN). The chessboard has standard algebraic notation ranks and files along the board edges.
  • Figure 2: Illustration of the initial array (left board diagram) and the standard algebraic chess notation (SAN) for the "Queen's Gambit Declined," a popular opening move sequence in the 1920's and 1930's (Encyclopedia of Chess Openings, sequences D30-42). The moves, in SAN, are as follows: 1. d4 d5, 2. c4 e6. The outcome of the two move sequences is shown on the right board diagram.
  • Figure 3: Influence of increasing instruction fine-tuning examples. Percentage of legal proposed moves versus count of the instruction fine-tuning examples for the TinyStories-28M (blue) and OPT-125M (orange) language models, instruction fine-tuned with learning rate = 2e-4, batch size = 4, and epochs = 3. The performance of the instruction fine-tuned language models was evaluated using 10,000 test instances of chess board states drawn from WSM-10M to assess the model's ability to generate a legal proposed move.
  • Figure 4: Percentage of legal proposed moves versus count of the instruction fine-tuning examples for the TinyStories-28M (blue) and OPT-125M (orange) language models, instruction fine-tuned with learning rate = 2e-4, batch size = 4, and epochs = 3. The performance of each instruction fine-tuned language model was evaluated using 1,000 test instances of chess problems drawn from Check/Mate-in-1 to assess the model's ability to generate a legal proposed move.
  • Figure 5: Percentage of proposed moves which were legal and resulted in check or checkmate versus count of the instruction fine-tuning examples for the TinyStories-28M (blue) and OPT-125M (orange) language models, instruction fine-tuned with learning rate = 2e-4, batch size = 4, and epochs = 3. The performance of each instruction fine-tuned language model was evaluated using 1,000 test instances of chess problems drawn from Check/Mate-in-1 to assess the model's ability to generate a legal move that resulted in check or checkmate.
  • ...and 7 more figures