Table of Contents
Fetching ...

Generating Creative Chess Puzzles

Xidong Feng, Vivek Veeriah, Marcus Chiam, Michael Dennis, Ryan Pachauri, Thomas Tumiel, Federico Barbero, Johan Obando-Ceron, Jiaxin Shi, Satinder Singh, Shaobo Hou, Nenad Tomašev, Tom Zahavy

TL;DR

This work tackles the challenge of generating truly creative chess puzzles by formalizing a multi-dimensional standard—uniqueness, counter-intuitiveness, novelty, and aesthetics—and implementing a reinforcement-learning framework guided by StockFish- and engine-based rewards. The authors train multiple discrete generative models on the Lichess Puzzler data and augment them with outcome-based RL, diversity filters, and realism constraints to produce puzzles that are unique, counter-intuitive, and visually appealing, while maintaining realism through KL penalties and data seeding. Empirical results show RL substantially increases counter-intuitive puzzle generation (up to 2.5%) beyond the Lichess baseline (2.1%) and prior models (0.4%), with human experts rating AI-generated puzzles as highly creative and enjoyable—often rivaling or surpassing composed books. The work demonstrates a scalable, human-in-the-loop approach to computational creativity in chess, with broader implications for open-ended problem solving in other domains.

Abstract

While Generative AI rapidly advances in various domains, generating truly creative, aesthetic, and counter-intuitive outputs remains a challenge. This paper presents an approach to tackle these difficulties in the domain of chess puzzles. We start by benchmarking Generative AI architectures, and then introduce an RL framework with novel rewards based on chess engine search statistics to overcome some of those shortcomings. The rewards are designed to enhance a puzzle's uniqueness, counter-intuitiveness, diversity, and realism. Our RL approach dramatically increases counter-intuitive puzzle generation by 10x, from 0.22\% (supervised) to 2.5\%, surpassing existing dataset rates (2.1\%) and the best Lichess-trained model (0.4\%). Our puzzles meet novelty and diversity benchmarks, retain aesthetic themes, and are rated by human experts as more creative, enjoyable, and counter-intuitive than composed book puzzles, even approaching classic compositions. Our final outcome is a curated booklet of these AI-generated puzzles, which is acknowledged for creativity by three world-renowned experts.

Generating Creative Chess Puzzles

TL;DR

This work tackles the challenge of generating truly creative chess puzzles by formalizing a multi-dimensional standard—uniqueness, counter-intuitiveness, novelty, and aesthetics—and implementing a reinforcement-learning framework guided by StockFish- and engine-based rewards. The authors train multiple discrete generative models on the Lichess Puzzler data and augment them with outcome-based RL, diversity filters, and realism constraints to produce puzzles that are unique, counter-intuitive, and visually appealing, while maintaining realism through KL penalties and data seeding. Empirical results show RL substantially increases counter-intuitive puzzle generation (up to 2.5%) beyond the Lichess baseline (2.1%) and prior models (0.4%), with human experts rating AI-generated puzzles as highly creative and enjoyable—often rivaling or surpassing composed books. The work demonstrates a scalable, human-in-the-loop approach to computational creativity in chess, with broader implications for open-ended problem solving in other domains.

Abstract

While Generative AI rapidly advances in various domains, generating truly creative, aesthetic, and counter-intuitive outputs remains a challenge. This paper presents an approach to tackle these difficulties in the domain of chess puzzles. We start by benchmarking Generative AI architectures, and then introduce an RL framework with novel rewards based on chess engine search statistics to overcome some of those shortcomings. The rewards are designed to enhance a puzzle's uniqueness, counter-intuitiveness, diversity, and realism. Our RL approach dramatically increases counter-intuitive puzzle generation by 10x, from 0.22\% (supervised) to 2.5\%, surpassing existing dataset rates (2.1\%) and the best Lichess-trained model (0.4\%). Our puzzles meet novelty and diversity benchmarks, retain aesthetic themes, and are rated by human experts as more creative, enjoyable, and counter-intuitive than composed book puzzles, even approaching classic compositions. Our final outcome is a curated booklet of these AI-generated puzzles, which is acknowledged for creativity by three world-renowned experts.

Paper Structure

This paper contains 44 sections, 8 equations, 20 figures, 9 tables.

Figures (20)

  • Figure 1: Our approach begins by training a generative model on the Lichess dataset (\ref{['sec:exp_models']}), followed by RL training (\ref{['sec:rl']}). Each position generated by the RL model is verified for legality, uniqueness, novelty, and counter-intuitiveness using chess engine search statistics (\ref{['sec:puzzle_score']}). The positions are filtered for aesthetics (\ref{['sec:aesthetics_features']}) and selected based on reward for our booklet.
  • Figure 2: Booklet examples of creative chess puzzle, generated with our methods, with a unique, counter intuitive and aesthetic solution (written upside down). 1 (left): White undefends both rooks with Rg6+. After one of the hanging rooks is captured, White plays the slow Qa1 move, sacrificing the second rook. The remaining White queen and bishop coordinate very well after Qf6+, with an unstoppable attack. The double rook sacrifice is counter intuitive and aesthetic for a human: both rooks are initially very active and it is surprising that the remaining queen and bishop are sufficient to win the game. The winning move in this position is both counter-intuitive and the only one for White. Even Stockfish needs a moment to identify it (see https://lichess.org/analysis/standard/1r1r2k1/Q2p1R1p/2p2R2/1p3pB1/1P4q1/8/5K2/8_w_-_-_0_1 on lichess), but then confidently confirms it's the sole path to victory. For a description of positions 2--4, please see Appendix \ref{['sec:aesthetics_features']}.
  • Figure 3: Counter-intuitiveness calculation visualization and equations.
  • Figure 4: Distribution of aesthetic themes in chess positions across different datasets.
  • Figure 5: RL run surpasses the baselines in puzzle metrics and keeps improving diversity through the training. We smooth the curve in (a)-(d) for better visualization.
  • ...and 15 more figures