Making New Connections: LLMs as Puzzle Generators for The New York Times' Connections Word Game
Tim Merino, Sam Earle, Ryan Sudhakaran, Shyam Sudhakaran, Julian Togelius
TL;DR
The paper demonstrates that large language models, when guided by Tree-of-Thought–inspired prompting and embedding-based difficulty assessment, can generate diverse and challenging Connections puzzles comparable to NYT puzzles in user perception. The authors implement a three-component pipeline (puzzle creator/editor/evaluator) and explore two difficulty-enhancing strategies—intentional overlap and false groups—evaluating via a human study against real puzzles. Results show AI-generated puzzles can rival human-authored ones on several metrics, though certain strategies increase difficulty and decrease solve rates, highlighting design considerations for PCG in word games. This work suggests LLM-driven puzzle generation can augment design workflows and inspire broader applications in procedural content generation for semantic clustering games, with potential for human-in-the-loop refinement and extension to other domains.
Abstract
The Connections puzzle is a word association game published daily by The New York Times (NYT). In this game, players are asked to find groups of four words that are connected by a common theme. While solving a given Connections puzzle requires both semantic knowledge and abstract reasoning, generating novel puzzles additionally requires a form of metacognition: generators must be able to accurately model the downstream reasoning of potential solvers. In this paper, we investigate the ability of the GPT family of Large Language Models (LLMs) to generate challenging and creative word games for human players. We start with an analysis of the word game Connections and the unique challenges it poses as a Procedural Content Generation (PCG) domain. We then propose a method for generating Connections puzzles using LLMs by adapting a Tree of Thoughts (ToT) prompting approach. We evaluate this method by conducting a user study, asking human players to compare AI-generated puzzles against published Connections puzzles. Our findings show that LLMs are capable puzzle creators, and can generate diverse sets of enjoyable, challenging, and creative Connections puzzles as judged by human users.
