Large Language Models as Common-Sense Heuristics
Andrey Borro, Patricia J Riddle, Michael W Barley, Michael J Witbrock
TL;DR
The paper tackles the problem of using Large Language Models to generate executable planning solutions by treating LLM outputs as common-sense heuristics for a hill-climbing local search within the VirtualHome household-domain environment. It introduces two types of guides (high-level and low-level) and an error-correcting, prompt-driven action-selection pipeline that operates directly in the environment's native representation, eliminating translation steps. Empirical results show that guide-assisted local search achieves about 69% task success, outperforming ProgPrompt by roughly 22 percentage points, while maintaining executability even under adversarial conditions. The work demonstrates that strong planning performance can be achieved without relying on intermediate languages, with practical implications for real-world household task planning and a call for open-source reproducibility.
Abstract
While systems designed for solving planning tasks vastly outperform Large Language Models (LLMs) in this domain, they usually discard the rich semantic information embedded within task descriptions. In contrast, LLMs possess parametrised knowledge across a wide range of topics, enabling them to leverage the natural language descriptions of planning tasks in their solutions. However, current research in this direction faces challenges in generating correct and executable plans. Furthermore, these approaches depend on the LLM to output solutions in an intermediate language, which must be translated into the representation language of the planning task. We introduce a novel planning method, which leverages the parametrised knowledge of LLMs by using their output as a heuristic for Hill-Climbing Search. This approach is further enhanced by prompting the LLM to generate a solution estimate to guide the search. Our method outperforms the task success rate of similar systems within a common household environment by 22 percentage points, with consistently executable plans. All actions are encoded in their original representation, demonstrating that strong results can be achieved without an intermediate language, thus eliminating the need for a translation step.
