Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models
Houjun Liu
TL;DR
The paper tackles the challenge of multi-step reasoning in large language models by framing Plan of Thoughts (PoT) as a Partially Observable Markov Decision Process (POMDP) and solving it with an online POMCP solver. PoT uses the language model’s own reflections about subproblem states as value-based heuristics to guide search, and employs a hybrid prompting scheme where GPT-4 handles posterior sampling/evaluation while GPT-3.5-Turbo-Instruct generates intermediate thoughts. On the Game of 24 benchmark, PoT achieves an 89.4% success rate, surpassing both Chain-of-Thought and Tree of Thoughts baselines, and exhibits strong anytime performance (much of the solving happens within the early time window). The approach demonstrates a scalable, heuristic-guided planning framework that extends modern LM reasoning capabilities to larger, more complex tasks, albeit with higher computational cost and reliance on large-model prompts.
Abstract
While language models (LMs) offer significant capability in zero-shot reasoning tasks across a wide range of domains, they do not perform satisfactorily in problems which requires multi-step reasoning. Previous approaches to mitigate this involves breaking a larger, multi-step task into sub-tasks and asking the language model to generate proposals ("thoughts") for each sub-task and using exhaustive planning approaches such as DFS to compose a solution. In this work, we leverage this idea to introduce two new contributions: first, we formalize a planning-based approach to perform multi-step problem solving with LMs via Partially Observable Markov Decision Processes (POMDPs), with the LM's own reflections about the value of a state used as a search heuristic; second, leveraging the online POMDP solver POMCP, we demonstrate a superior success rate of 89.4% on the Game of 24 task as compared to existing approaches while also offering better anytime performance characteristics than fixed tree-search which is used previously. Taken together, these contributions allow modern LMs to decompose and solve larger-scale reasoning tasks more effectively.
