PEA: Enhancing LLM Performance on Computational-Reasoning Tasks
Zi Wang, Shiwei Weng, Mohannad Alhanahnah, Somesh Jha, Tom Reps
TL;DR
This work tackles the challenge of formalizing and improving computational reasoning with large language models by introducing the Predicate-Enumeration-Aggregation (PEA) framework, which models problems as quantified predicates over finite domains and delegates solving to synthesized programs. By decomposing problems into predicate evaluation, enumeration of candidate solutions, and aggregation of results, PEA leverages LLMs' coding capabilities to produce executable reasoning generators that, when executed, yield concrete results. Empirical evaluations on SAT variants, the Game of 24, and planning benchmarks demonstrate approximately $50\%$ average accuracy improvements and notable efficiency gains, with perfect performance in several tasks and robust performance across model variants. The approach highlights the practical benefit of translating reasoning into programmable, verifiable steps and suggests strong potential for scaling to more complex or hybrid reasoning tasks through targeted pruning and optimization. This framework enables a principled, reusable pathway for integrating logical reasoning with programmable problem-solving in real-world settings.
Abstract
Large Language Models (LLMs) have exhibited remarkable capabilities across diverse domains, prompting investigations into their potential as generic reasoning engines. While recent studies have explored inference-time computation to enhance model performance on complex problems, current research lacks a formal framework to characterize the complexity of reasoning tasks. This study introduces the Predicate-Enumeration-Aggregation (PEA) framework, a formal approach to describe and solve a class of important reasoning tasks termed computational reasoning problems. The PEA framework decomposes these problems into predicate and enumeration components, using LLMs to synthesize programs based on specified predicates, enumeration, and aggregation rules. These synthesized programs are then executed to obtain solutions to the computational tasks. We demonstrate the framework's efficacy on benchmark tasks including Boolean satisfiability problems, game of $24$, and planning problems. Empirical evaluation reveals that PEA substantially enhances the performance of underlying models on benchmark computational problems, yielding an average accuracy improvement of approximately $50\%$, coupled with increased efficiency.
