Generating Creative Chess Puzzles
Xidong Feng, Vivek Veeriah, Marcus Chiam, Michael Dennis, Ryan Pachauri, Thomas Tumiel, Federico Barbero, Johan Obando-Ceron, Jiaxin Shi, Satinder Singh, Shaobo Hou, Nenad Tomašev, Tom Zahavy
TL;DR
This work tackles the challenge of generating truly creative chess puzzles by formalizing a multi-dimensional standard—uniqueness, counter-intuitiveness, novelty, and aesthetics—and implementing a reinforcement-learning framework guided by StockFish- and engine-based rewards. The authors train multiple discrete generative models on the Lichess Puzzler data and augment them with outcome-based RL, diversity filters, and realism constraints to produce puzzles that are unique, counter-intuitive, and visually appealing, while maintaining realism through KL penalties and data seeding. Empirical results show RL substantially increases counter-intuitive puzzle generation (up to 2.5%) beyond the Lichess baseline (2.1%) and prior models (0.4%), with human experts rating AI-generated puzzles as highly creative and enjoyable—often rivaling or surpassing composed books. The work demonstrates a scalable, human-in-the-loop approach to computational creativity in chess, with broader implications for open-ended problem solving in other domains.
Abstract
While Generative AI rapidly advances in various domains, generating truly creative, aesthetic, and counter-intuitive outputs remains a challenge. This paper presents an approach to tackle these difficulties in the domain of chess puzzles. We start by benchmarking Generative AI architectures, and then introduce an RL framework with novel rewards based on chess engine search statistics to overcome some of those shortcomings. The rewards are designed to enhance a puzzle's uniqueness, counter-intuitiveness, diversity, and realism. Our RL approach dramatically increases counter-intuitive puzzle generation by 10x, from 0.22\% (supervised) to 2.5\%, surpassing existing dataset rates (2.1\%) and the best Lichess-trained model (0.4\%). Our puzzles meet novelty and diversity benchmarks, retain aesthetic themes, and are rated by human experts as more creative, enjoyable, and counter-intuitive than composed book puzzles, even approaching classic compositions. Our final outcome is a curated booklet of these AI-generated puzzles, which is acknowledged for creativity by three world-renowned experts.
