Table of Contents
Fetching ...

Solving Zebra Puzzles Using Constraint-Guided Multi-Agent Systems

Shmuel Berman, Kathleen McKeown, Baishakhi Ray

TL;DR

This paper addresses the challenge of solving Zebra puzzles, a class of logic grid problems, by integrating Large Language Models with an off-the-shelf SMT solver in a multi-agent framework (ZPS). The method decomposes puzzles into subproblems, translates clues into SMT-LIB constraints, and uses iterative solver feedback to refine solutions, reinforced by an automated grader and human validation. Key findings show substantial performance gains over LLMs alone, with GPT-4 achieving up to 166% more fully correct solutions, and the autograder correlating strongly with human judgments. The work demonstrates that structured planning, agent feedback, and formal reasoning integration can robustly enhance NL-to-logic problem solving with practical implications for automated reasoning tasks.

Abstract

Prior research has enhanced the ability of Large Language Models (LLMs) to solve logic puzzles using techniques such as chain-of-thought prompting or introducing a symbolic representation. These frameworks are still usually insufficient to solve complicated logical problems, such as Zebra puzzles, due to the inherent complexity of translating natural language clues into logical statements. We introduce a multi-agent system, ZPS, that integrates LLMs with an off the shelf theorem prover. This system tackles the complex puzzle-solving task by breaking down the problem into smaller, manageable parts, generating SMT (Satisfiability Modulo Theories) code to solve them with a theorem prover, and using feedback between the agents to repeatedly improve their answers. We also introduce an automated grid puzzle grader to assess the correctness of our puzzle solutions and show that the automated grader is reliable by evaluating it in a user-study. Our approach shows improvement in all three LLMs we tested, with GPT-4 showing 166% improvement in the number of fully correct solutions.

Solving Zebra Puzzles Using Constraint-Guided Multi-Agent Systems

TL;DR

This paper addresses the challenge of solving Zebra puzzles, a class of logic grid problems, by integrating Large Language Models with an off-the-shelf SMT solver in a multi-agent framework (ZPS). The method decomposes puzzles into subproblems, translates clues into SMT-LIB constraints, and uses iterative solver feedback to refine solutions, reinforced by an automated grader and human validation. Key findings show substantial performance gains over LLMs alone, with GPT-4 achieving up to 166% more fully correct solutions, and the autograder correlating strongly with human judgments. The work demonstrates that structured planning, agent feedback, and formal reasoning integration can robustly enhance NL-to-logic problem solving with practical implications for automated reasoning tasks.

Abstract

Prior research has enhanced the ability of Large Language Models (LLMs) to solve logic puzzles using techniques such as chain-of-thought prompting or introducing a symbolic representation. These frameworks are still usually insufficient to solve complicated logical problems, such as Zebra puzzles, due to the inherent complexity of translating natural language clues into logical statements. We introduce a multi-agent system, ZPS, that integrates LLMs with an off the shelf theorem prover. This system tackles the complex puzzle-solving task by breaking down the problem into smaller, manageable parts, generating SMT (Satisfiability Modulo Theories) code to solve them with a theorem prover, and using feedback between the agents to repeatedly improve their answers. We also introduce an automated grid puzzle grader to assess the correctness of our puzzle solutions and show that the automated grader is reliable by evaluating it in a user-study. Our approach shows improvement in all three LLMs we tested, with GPT-4 showing 166% improvement in the number of fully correct solutions.
Paper Structure (31 sections, 2 equations, 3 figures, 5 tables)

This paper contains 31 sections, 2 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: An Example Zebra Puzzle.
  • Figure 2: Logic Puzzle Solver Workflow
  • Figure 3: Example Feedback Puzzle Solving Process. The puzzle is decomposed and then the LLM-agent attempts to translate it into a logical SMT formula. The theorem prover attempts to solve it, and the feedback is fed back into the LLM-agent so that it can modify its formal representation.