Solving Zebra Puzzles Using Constraint-Guided Multi-Agent Systems
Shmuel Berman, Kathleen McKeown, Baishakhi Ray
TL;DR
This paper addresses the challenge of solving Zebra puzzles, a class of logic grid problems, by integrating Large Language Models with an off-the-shelf SMT solver in a multi-agent framework (ZPS). The method decomposes puzzles into subproblems, translates clues into SMT-LIB constraints, and uses iterative solver feedback to refine solutions, reinforced by an automated grader and human validation. Key findings show substantial performance gains over LLMs alone, with GPT-4 achieving up to 166% more fully correct solutions, and the autograder correlating strongly with human judgments. The work demonstrates that structured planning, agent feedback, and formal reasoning integration can robustly enhance NL-to-logic problem solving with practical implications for automated reasoning tasks.
Abstract
Prior research has enhanced the ability of Large Language Models (LLMs) to solve logic puzzles using techniques such as chain-of-thought prompting or introducing a symbolic representation. These frameworks are still usually insufficient to solve complicated logical problems, such as Zebra puzzles, due to the inherent complexity of translating natural language clues into logical statements. We introduce a multi-agent system, ZPS, that integrates LLMs with an off the shelf theorem prover. This system tackles the complex puzzle-solving task by breaking down the problem into smaller, manageable parts, generating SMT (Satisfiability Modulo Theories) code to solve them with a theorem prover, and using feedback between the agents to repeatedly improve their answers. We also introduce an automated grid puzzle grader to assess the correctness of our puzzle solutions and show that the automated grader is reliable by evaluating it in a user-study. Our approach shows improvement in all three LLMs we tested, with GPT-4 showing 166% improvement in the number of fully correct solutions.
