Systematic Analysis of LLM Contributions to Planning: Solver, Verifier, Heuristic
Haoming Li, Zhaoliang Chen, Songyuan Liu, Yiming Lu, Fei Liu
TL;DR
The paper addresses how to evaluate large language models in planning by decomposing their roles into solver, verifier, and heuristic components and testing them across fitness, course, and travel planning tasks. It demonstrates that while LLMs often struggle to generate correct plans de novo, they provide valuable feedback signals when used as comparative heuristics within tree-search-like reasoning. A formal framework with real-time user-preference benchmarking guides future LLM-driven planning systems and real-world personalization. The findings highlight the potential of LLMs to contribute via feedback and ranking rather than direct plan generation, and they underscore the need for robust verification methods and dynamic constraint handling in constrained planning problems.
Abstract
In this work, we provide a systematic analysis of how large language models (LLMs) contribute to solving planning problems. In particular, we examine how LLMs perform when they are used as problem solver, solution verifier, and heuristic guidance to improve intermediate solutions. Our analysis reveals that although it is difficult for LLMs to generate correct plans out-of-the-box, LLMs are much better at providing feedback signals to intermediate/incomplete solutions in the form of comparative heuristic functions. This evaluation framework provides insights into how future work may design better LLM-based tree-search algorithms to solve diverse planning and reasoning problems. We also propose a novel benchmark to evaluate LLM's ability to learn user preferences on the fly, which has wide applications in practical settings.
