Table of Contents
Fetching ...

Making New Connections: LLMs as Puzzle Generators for The New York Times' Connections Word Game

Tim Merino, Sam Earle, Ryan Sudhakaran, Shyam Sudhakaran, Julian Togelius

TL;DR

The paper demonstrates that large language models, when guided by Tree-of-Thought–inspired prompting and embedding-based difficulty assessment, can generate diverse and challenging Connections puzzles comparable to NYT puzzles in user perception. The authors implement a three-component pipeline (puzzle creator/editor/evaluator) and explore two difficulty-enhancing strategies—intentional overlap and false groups—evaluating via a human study against real puzzles. Results show AI-generated puzzles can rival human-authored ones on several metrics, though certain strategies increase difficulty and decrease solve rates, highlighting design considerations for PCG in word games. This work suggests LLM-driven puzzle generation can augment design workflows and inspire broader applications in procedural content generation for semantic clustering games, with potential for human-in-the-loop refinement and extension to other domains.

Abstract

The Connections puzzle is a word association game published daily by The New York Times (NYT). In this game, players are asked to find groups of four words that are connected by a common theme. While solving a given Connections puzzle requires both semantic knowledge and abstract reasoning, generating novel puzzles additionally requires a form of metacognition: generators must be able to accurately model the downstream reasoning of potential solvers. In this paper, we investigate the ability of the GPT family of Large Language Models (LLMs) to generate challenging and creative word games for human players. We start with an analysis of the word game Connections and the unique challenges it poses as a Procedural Content Generation (PCG) domain. We then propose a method for generating Connections puzzles using LLMs by adapting a Tree of Thoughts (ToT) prompting approach. We evaluate this method by conducting a user study, asking human players to compare AI-generated puzzles against published Connections puzzles. Our findings show that LLMs are capable puzzle creators, and can generate diverse sets of enjoyable, challenging, and creative Connections puzzles as judged by human users.

Making New Connections: LLMs as Puzzle Generators for The New York Times' Connections Word Game

TL;DR

The paper demonstrates that large language models, when guided by Tree-of-Thought–inspired prompting and embedding-based difficulty assessment, can generate diverse and challenging Connections puzzles comparable to NYT puzzles in user perception. The authors implement a three-component pipeline (puzzle creator/editor/evaluator) and explore two difficulty-enhancing strategies—intentional overlap and false groups—evaluating via a human study against real puzzles. Results show AI-generated puzzles can rival human-authored ones on several metrics, though certain strategies increase difficulty and decrease solve rates, highlighting design considerations for PCG in word games. This work suggests LLM-driven puzzle generation can augment design workflows and inspire broader applications in procedural content generation for semantic clustering games, with potential for human-in-the-loop refinement and extension to other domains.

Abstract

The Connections puzzle is a word association game published daily by The New York Times (NYT). In this game, players are asked to find groups of four words that are connected by a common theme. While solving a given Connections puzzle requires both semantic knowledge and abstract reasoning, generating novel puzzles additionally requires a form of metacognition: generators must be able to accurately model the downstream reasoning of potential solvers. In this paper, we investigate the ability of the GPT family of Large Language Models (LLMs) to generate challenging and creative word games for human players. We start with an analysis of the word game Connections and the unique challenges it poses as a Procedural Content Generation (PCG) domain. We then propose a method for generating Connections puzzles using LLMs by adapting a Tree of Thoughts (ToT) prompting approach. We evaluate this method by conducting a user study, asking human players to compare AI-generated puzzles against published Connections puzzles. Our findings show that LLMs are capable puzzle creators, and can generate diverse sets of enjoyable, challenging, and creative Connections puzzles as judged by human users.
Paper Structure (33 sections, 5 figures, 3 tables)

This paper contains 33 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The help page of the Connections puzzle
  • Figure 2: A Connections puzzle, published by The New York Times on Jan 19, 2024
  • Figure 3: Overview of LLM-driven pipelines for generating Connections puzzles with false or intentionally overlapping groups.
  • Figure 4: User response data for survey questions 4-6. Colored bars represent AI puzzle preference, while gray bars represent "Tie / Neither". AI-generated puzzles with LLM-generated false groups, in particular, are competitive with NYT puzzles in terms of user preference and perceived creativity, while being judged generally more difficult.
  • Figure 5: Number of mistakes made by percentage of puzzle plays, grouped by puzzle sub-type. AI-generated puzzles involving intentional overlaps proved most difficult to human players, while those seeded with false groups from existing NYT puzzles proved significantly easier than actual NYT puzzles.