Logic.py: Bridging the Gap between LLMs and Constraint Solvers
Pascal Kesseli, Peter O'Hearn, Ricardo Silveira Cabral
TL;DR
This work tackles the challenge of solving complex logical reasoning tasks with large language models by introducing Logic.py, a Python-based domain-specific language designed to express logic and search constraints succinctly. A dedicated Logic Agent translates informal problem descriptions into Logic.py formalisations and leverages a constraint solver via a libCST-translated CBMC backend, effectively combining LLM capabilities with symbolic reasoning. The approach achieves strong empirical gains on ZebraLogicBench, reaching over $90\%$ accuracy and delivering a $65\%$ absolute improvement over the $24.9\%$ baseline of LLama 3.1 70B, thereby demonstrating the power of DSL-grounded problem formalisation paired with external solvers. This indicates significant potential for extending LLM-based problem solving to broader domains by grounding natural language in domain-specific formalisms and solver technology.
Abstract
We present a novel approach to formalise and solve search-based problems using large language models, which significantly improves upon previous state-of-the-art results. We demonstrate the efficacy of this approach on the logic puzzles benchmark ZebraLogicBench. Instead of letting the LLM attempt to directly solve the puzzles, our method prompts the model to formalise the problem in a logic-focused domain-specific language (DSL) called Logic.py. This formalised representation is then solved using a constraint solver, leveraging the strengths of both the language model and the solver. Our approach achieves a remarkable 65% absolute improvement over the baseline performance of Llama 3.1 70B on ZebraLogicBench, setting a new state-of-the-art with an accuracy of over 90%. This significant advancement demonstrates the potential of combining language models with domain-specific languages and auxiliary tools on traditionally challenging tasks for LLMs.
