You shall know a piece by the company it keeps. Chess plays as a data for word2vec models

Boris Orekhov

You shall know a piece by the company it keeps. Chess plays as a data for word2vec models

Boris Orekhov

TL;DR

This work reframes chess moves as textual tokens and applies word2vec-style embeddings to two data paradigms: moves-only sequences and moves-with-board-positions. Leveraging a large corpus of over 5.4 million high-level games, the study analyzes vector neighborhoods, cosine similarities, and tSNE visualizations to uncover semantic structure, including quasisynonyms and endgame-specific clustering. While not aimed at improving move choice in engines, the results demonstrate that distributional semantics capture meaningful aspects of game stages, piece identity, and castling dynamics, revealing how context shapes chess moves even when positional context is limited. The findings highlight academic value in cross-domain embeddings and show that, despite practical limitations, context-rich representations can illuminate structural regularities in chess data.

Abstract

In this paper, I apply linguistic methods of analysis to non-linguistic data, chess plays, metaphorically equating one with the other and seeking analogies. Chess game notations are also a kind of text, and one can consider the records of moves or positions of pieces as words and statements in a certain language. In this article I show how word embeddings (word2vec) can work on chess game texts instead of natural language texts. I don't see how this representation of chess data can be used productively. It's unlikely that these vector models will help engines or people choose the best move. But in a purely academic sense, it's clear that such methods of information representation capture something important about the very nature of the game, which doesn't necessarily lead to a win.

You shall know a piece by the company it keeps. Chess plays as a data for word2vec models

TL;DR

Abstract

You shall know a piece by the company it keeps. Chess plays as a data for word2vec models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)