THOUGHTSCULPT: Reasoning with Intermediate Revision and Search

Yizhou Chi; Kevin Yang; Dan Klein

THOUGHTSCULPT: Reasoning with Intermediate Revision and Search

Yizhou Chi, Kevin Yang, Dan Klein

TL;DR

THOUGHTSCULPT addresses the need for iterative reasoning with the ability to revise intermediate outputs in large language models. It proposes a general framework built around three modules—Thought Evaluator, Thought Generator, and Decision Simulator—and employs Monte Carlo Tree Search to navigate a graph of thought nodes with revision actions. Across Story Outline Improvement, Mini-Crosswords, and Constrained Generation, THOUGHTSCULPT consistently outperforms strong baselines, especially when using MCTS, while remaining inference-only and not requiring extra training. The approach offers a flexible, task-agnostic planner for long-form reasoning and creative generation, with potential impact on diverse domains requiring structured, revisable outputs, at the cost of higher computational demand.

Abstract

We present THOUGHTSCULPT, a general reasoning and search method for tasks with outputs that can be decomposed into components. THOUGHTSCULPT explores a search tree of potential solutions using Monte Carlo Tree Search (MCTS), building solutions one action at a time and evaluating according to any domain-specific heuristic, which in practice is often simply an LLM evaluator. Critically, our action space includes revision actions: THOUGHTSCULPT may choose to revise part of its previous output rather than continuing to build the rest of its output. Empirically, THOUGHTSCULPT outperforms state-of-the-art reasoning methods across three challenging tasks: Story Outline Improvement (up to +30% interestingness), Mini-Crosswords Solving (up to +16% word success rate), and Constrained Generation (up to +10% concept coverage).

THOUGHTSCULPT: Reasoning with Intermediate Revision and Search

TL;DR

Abstract

Paper Structure (50 sections, 3 equations, 6 figures, 10 tables, 2 algorithms)

This paper contains 50 sections, 3 equations, 6 figures, 10 tables, 2 algorithms.

Introduction
Related Works
Feedback Guided Generation.
Graph Reasoning.
LM Planning.
Method
Thought Evaluator
Thought Generator
Decision Simulator
Experiments
Story Outline Improvement
Task Setup
Method Setup
Results
Continuous Improvement
...and 35 more sections

Figures (6)

Figure 1: Illustration of ThoughtSculpt using Monte Carlo Tree Search on the Constrained Generation task. Each circle in the diagram represents a thought node generated by LLMs. Selection: choose a thought node $x$ based on a selection algorithm. Expansion: A new set of child nodes $X$ is generated using the initial instruction, the current node, and self-evaluated textual feedback. The zoom-in of the expansion phase demonstrates the use of the Thought Evaluator and the Thought Generator, which entails assessing and refining the current solution for the task \ref{['task3']}. Simulation: a single node $x'$ is randomly chosen from the set $X$. This selected node $x'$ generates further nodes in sequence for several steps, corresponding to our Decision Simulator. Backpropagation: The numerical feedback evaluated at the last node is propagated back to the root node.
Figure 2: Illustration of our Story Outline Improvement task. A step involves employing the thought evaluator to conduct itemized evaluations of the story outline and utilizing the thought generator to generate a candidate set of improved story outlines for task \ref{['task1']}.
Figure 3: Proportion of outlines generated by each method that were preferred by humans in pairwise comparison. ("Neither" indicates that neither ThoughtSculpt nor the baseline methods were preferred.)
Figure 4: Average outline interestingness at each step. ThoughtSculpt's interestingness increases more with steps compared to baselines.
Figure 5: Illustration of a step in the deliberation process in the Mini-Crosswords task, where the current crossword board is assessed using the thought evaluator and a candidate set of words is proposed for task \ref{['task2']}. One step is equal to one $d_{rollout}$
...and 1 more figures

THOUGHTSCULPT: Reasoning with Intermediate Revision and Search

TL;DR

Abstract

THOUGHTSCULPT: Reasoning with Intermediate Revision and Search

Authors

TL;DR

Abstract

Table of Contents

Figures (6)