Automating Thought of Search: A Journey Towards Soundness and Completeness

Daniel Cao; Michael Katz; Harsha Kokel; Kavitha Srinivas; Shirin Sohrabi

Automating Thought of Search: A Journey Towards Soundness and Completeness

Daniel Cao, Michael Katz, Harsha Kokel, Kavitha Srinivas, Shirin Sohrabi

TL;DR

This work tackles automating Thought of Search by replacing human feedback with automated domain-specific unit tests that guide LLMs to generate sound and complete successor and goal tests. AutoToS uses a structured, test-driven workflow, CoT prompting, and lightweight BFS/DFS validation to produce correct planning components with limited LLM calls. Across five planning domains and multiple models, AutoToS achieves near-100% accuracy, demonstrating scalable, automated planning with LLMs and reducing the need for expert input. The approach enables reliable, verifiable planning components that can be plugged into standard search algorithms, advancing practical AI planning pipelines.

Abstract

Large language models (LLMs) are being used to solve planning problems that require search. Most of the literature uses LLMs as world models to define the search space, forgoing soundness for the sake of flexibility. A recent work, Thought of Search (ToS), proposed defining the search space with code, having LLMs produce that code. ToS requires a human in the loop, collaboratively producing a sound successor function and goal test. The result, however, is worth the effort: all the tested datasets were solved with 100% accuracy. Consequently, there is great potential to automate the ToS process. We take a first major step towards automating ToS (AutoToS), taking the human out of the loop of interactions with the language model. AutoToS guides the language model step by step towards the generation of sound and complete search components, through feedback from both generic and domain specific unit tests. We show that AutoToS is able to achieve 100% accuracy on all the evaluated domains with a small number of LLM calls.

Automating Thought of Search: A Journey Towards Soundness and Completeness

TL;DR

Abstract

Paper Structure (12 sections, 5 figures, 1 table)

This paper contains 12 sections, 5 figures, 1 table.

Introduction
Related Works
Background
Proposed Approach and Methodology
Experiments
Evaluation Accuracy
Number of LLM calls
Code Errors Discussion
Conclusions, Limitations, Societal Impact, and Future Work
Additional data for experimental domains
24 Game
Goal Unit Test

Figures (5)

Figure 1: An overview of ToS and AutoToS.
Figure 2: 24 Game example feedback.
Figure 3: Accuracy progression during AutoToS.
Figure 4: Average number of feedback calls for goal correctness, successor soundness/completeness.
Figure 5: Partition of the errors by types in the generated code.

Theorems & Definitions (1)

Definition 1: Soundness and completeness

Automating Thought of Search: A Journey Towards Soundness and Completeness

TL;DR

Abstract

Automating Thought of Search: A Journey Towards Soundness and Completeness

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (1)