Table of Contents
Fetching ...

Systematic Analysis of LLM Contributions to Planning: Solver, Verifier, Heuristic

Haoming Li, Zhaoliang Chen, Songyuan Liu, Yiming Lu, Fei Liu

TL;DR

The paper addresses how to evaluate large language models in planning by decomposing their roles into solver, verifier, and heuristic components and testing them across fitness, course, and travel planning tasks. It demonstrates that while LLMs often struggle to generate correct plans de novo, they provide valuable feedback signals when used as comparative heuristics within tree-search-like reasoning. A formal framework with real-time user-preference benchmarking guides future LLM-driven planning systems and real-world personalization. The findings highlight the potential of LLMs to contribute via feedback and ranking rather than direct plan generation, and they underscore the need for robust verification methods and dynamic constraint handling in constrained planning problems.

Abstract

In this work, we provide a systematic analysis of how large language models (LLMs) contribute to solving planning problems. In particular, we examine how LLMs perform when they are used as problem solver, solution verifier, and heuristic guidance to improve intermediate solutions. Our analysis reveals that although it is difficult for LLMs to generate correct plans out-of-the-box, LLMs are much better at providing feedback signals to intermediate/incomplete solutions in the form of comparative heuristic functions. This evaluation framework provides insights into how future work may design better LLM-based tree-search algorithms to solve diverse planning and reasoning problems. We also propose a novel benchmark to evaluate LLM's ability to learn user preferences on the fly, which has wide applications in practical settings.

Systematic Analysis of LLM Contributions to Planning: Solver, Verifier, Heuristic

TL;DR

The paper addresses how to evaluate large language models in planning by decomposing their roles into solver, verifier, and heuristic components and testing them across fitness, course, and travel planning tasks. It demonstrates that while LLMs often struggle to generate correct plans de novo, they provide valuable feedback signals when used as comparative heuristics within tree-search-like reasoning. A formal framework with real-time user-preference benchmarking guides future LLM-driven planning systems and real-world personalization. The findings highlight the potential of LLMs to contribute via feedback and ranking rather than direct plan generation, and they underscore the need for robust verification methods and dynamic constraint handling in constrained planning problems.

Abstract

In this work, we provide a systematic analysis of how large language models (LLMs) contribute to solving planning problems. In particular, we examine how LLMs perform when they are used as problem solver, solution verifier, and heuristic guidance to improve intermediate solutions. Our analysis reveals that although it is difficult for LLMs to generate correct plans out-of-the-box, LLMs are much better at providing feedback signals to intermediate/incomplete solutions in the form of comparative heuristic functions. This evaluation framework provides insights into how future work may design better LLM-based tree-search algorithms to solve diverse planning and reasoning problems. We also propose a novel benchmark to evaluate LLM's ability to learn user preferences on the fly, which has wide applications in practical settings.

Paper Structure

This paper contains 24 sections, 2 equations, 1 figure, 9 tables.

Figures (1)

  • Figure 1: Illustration of Proposed Fitness Planning Framework