A Solver-in-the-Loop Framework for Improving LLMs on Answer Set Programming for Logic Puzzle Solving
Timo Pierre Schrader, Lukas Lange, Tobias Kaminski, Simon Razniewski, Annemarie Friedrich
TL;DR
This work tackles the difficulty of generating correct Answer Set Programming (ASP) code with large language models by closing the loop with an ASP solver. It introduces a solver-in-the-loop framework that (i) samples partial ASP encodings, (ii) uses solver feedback to label selections for supervised fine-tuning, and (iii) applies solver-guided best-of-N sampling at test time with a carefully designed reward to improve robustness. The approach yields consistent gains across two prompting settings and two datasets (LogicPuzzles and GridPuzzles) for multiple open-weight LLMs, including notable improvements for smaller models after SFT and for GPT-4.1-mini under inference-time guidance. A key contribution is the combination of silver-standard training data generation from solver feedback and solver-based inference to steer ASP code generation without extra human annotations, enabling more reliable domain-specific coding with LLMs.
Abstract
The rise of large language models (LLMs) has sparked interest in coding assistants. While general-purpose programming languages are well supported, generating code for domain-specific languages remains a challenging problem for LLMs. In this paper, we focus on the LLM-based generation of code for Answer Set Programming (ASP), a particularly effective approach for finding solutions to combinatorial search problems. The effectiveness of LLMs in ASP code generation is currently hindered by the limited number of examples seen during their initial pre-training phase. In this paper, we introduce a novel ASP-solver-in-the-loop approach for solver-guided instruction-tuning of LLMs to addressing the highly complex semantic parsing task inherent in ASP code generation. Our method only requires problem specifications in natural language and their solutions. Specifically, we sample ASP statements for program continuations from LLMs for unriddling logic puzzles. Leveraging the special property of declarative ASP programming that partial encodings increasingly narrow down the solution space, we categorize them into chosen and rejected instances based on solver feedback. We then apply supervised fine-tuning to train LLMs on the curated data and further improve robustness using a solver-guided search that includes best-of-N sampling. Our experiments demonstrate consistent improvements in two distinct prompting settings on two datasets.
