What's the Plan? Evaluating and Developing Planning-Aware Techniques for Language Models
Eran Hirsch, Guy Uziel, Ateret Anaby-Tavor
TL;DR
This work addresses the gap between large language models and classical planning by diagnosing LLMs’ weaknesses in world modeling and action reasoning, and by proposing SimPlan, a hybrid planner that couples Greedy Best-First Search with external world modeling and a ColBERT-based action-ranking heuristic. The core contribution is a generalized planning framework that maintains explicit world models via an external tool while leveraging LLMs to score actions, enabling efficient state-space exploration with a robust search strategy defined by Cost$(\pi) = -\frac{1}{n}\sum_{i=1}^{n} \log P_\theta(a_i|s_{i-1},G)$. Across five diverse domains (Blocksworld, Ferry, Grippers, Depots, Minigrid) and simple/complex configurations, SimPlan achieves higher success rates than prior LLM-based planners, though challenges remain in the most complex Depots domain and with very large problem sizes. The work introduces a generalized planning setup, data-augmentation to reduce identifier bias, and comprehensive ablations demonstrating the necessity of state updates and external world modeling; these contributions advance practical planning capabilities for real-world agent systems and provide a blueprint for further improvements in hybrid planning.
Abstract
Planning is a fundamental task in artificial intelligence that involves finding a sequence of actions that achieve a specified goal in a given environment. Large language models (LLMs) are increasingly used for applications that require planning capabilities, such as web or embodied agents. In line with recent studies, we demonstrate through experimentation that LLMs lack necessary skills required for planning. Based on these observations, we advocate for the potential of a hybrid approach that combines LLMs with classical planning methodology. Then, we introduce SimPlan, a novel hybrid-method, and evaluate its performance in a new challenging setup. Our extensive experiments across various planning domains demonstrate that SimPlan significantly outperforms existing LLM-based planners.
