Automating Thought of Search: A Journey Towards Soundness and Completeness
Daniel Cao, Michael Katz, Harsha Kokel, Kavitha Srinivas, Shirin Sohrabi
TL;DR
This work tackles automating Thought of Search by replacing human feedback with automated domain-specific unit tests that guide LLMs to generate sound and complete successor and goal tests. AutoToS uses a structured, test-driven workflow, CoT prompting, and lightweight BFS/DFS validation to produce correct planning components with limited LLM calls. Across five planning domains and multiple models, AutoToS achieves near-100% accuracy, demonstrating scalable, automated planning with LLMs and reducing the need for expert input. The approach enables reliable, verifiable planning components that can be plugged into standard search algorithms, advancing practical AI planning pipelines.
Abstract
Large language models (LLMs) are being used to solve planning problems that require search. Most of the literature uses LLMs as world models to define the search space, forgoing soundness for the sake of flexibility. A recent work, Thought of Search (ToS), proposed defining the search space with code, having LLMs produce that code. ToS requires a human in the loop, collaboratively producing a sound successor function and goal test. The result, however, is worth the effort: all the tested datasets were solved with 100% accuracy. Consequently, there is great potential to automate the ToS process. We take a first major step towards automating ToS (AutoToS), taking the human out of the loop of interactions with the language model. AutoToS guides the language model step by step towards the generation of sound and complete search components, through feedback from both generic and domain specific unit tests. We show that AutoToS is able to achieve 100% accuracy on all the evaluated domains with a small number of LLM calls.
