Table of Contents
Fetching ...

THOUGHTSCULPT: Reasoning with Intermediate Revision and Search

Yizhou Chi, Kevin Yang, Dan Klein

TL;DR

THOUGHTSCULPT addresses the need for iterative reasoning with the ability to revise intermediate outputs in large language models. It proposes a general framework built around three modules—Thought Evaluator, Thought Generator, and Decision Simulator—and employs Monte Carlo Tree Search to navigate a graph of thought nodes with revision actions. Across Story Outline Improvement, Mini-Crosswords, and Constrained Generation, THOUGHTSCULPT consistently outperforms strong baselines, especially when using MCTS, while remaining inference-only and not requiring extra training. The approach offers a flexible, task-agnostic planner for long-form reasoning and creative generation, with potential impact on diverse domains requiring structured, revisable outputs, at the cost of higher computational demand.

Abstract

We present THOUGHTSCULPT, a general reasoning and search method for tasks with outputs that can be decomposed into components. THOUGHTSCULPT explores a search tree of potential solutions using Monte Carlo Tree Search (MCTS), building solutions one action at a time and evaluating according to any domain-specific heuristic, which in practice is often simply an LLM evaluator. Critically, our action space includes revision actions: THOUGHTSCULPT may choose to revise part of its previous output rather than continuing to build the rest of its output. Empirically, THOUGHTSCULPT outperforms state-of-the-art reasoning methods across three challenging tasks: Story Outline Improvement (up to +30% interestingness), Mini-Crosswords Solving (up to +16% word success rate), and Constrained Generation (up to +10% concept coverage).

THOUGHTSCULPT: Reasoning with Intermediate Revision and Search

TL;DR

THOUGHTSCULPT addresses the need for iterative reasoning with the ability to revise intermediate outputs in large language models. It proposes a general framework built around three modules—Thought Evaluator, Thought Generator, and Decision Simulator—and employs Monte Carlo Tree Search to navigate a graph of thought nodes with revision actions. Across Story Outline Improvement, Mini-Crosswords, and Constrained Generation, THOUGHTSCULPT consistently outperforms strong baselines, especially when using MCTS, while remaining inference-only and not requiring extra training. The approach offers a flexible, task-agnostic planner for long-form reasoning and creative generation, with potential impact on diverse domains requiring structured, revisable outputs, at the cost of higher computational demand.

Abstract

We present THOUGHTSCULPT, a general reasoning and search method for tasks with outputs that can be decomposed into components. THOUGHTSCULPT explores a search tree of potential solutions using Monte Carlo Tree Search (MCTS), building solutions one action at a time and evaluating according to any domain-specific heuristic, which in practice is often simply an LLM evaluator. Critically, our action space includes revision actions: THOUGHTSCULPT may choose to revise part of its previous output rather than continuing to build the rest of its output. Empirically, THOUGHTSCULPT outperforms state-of-the-art reasoning methods across three challenging tasks: Story Outline Improvement (up to +30% interestingness), Mini-Crosswords Solving (up to +16% word success rate), and Constrained Generation (up to +10% concept coverage).
Paper Structure (50 sections, 3 equations, 6 figures, 10 tables, 2 algorithms)

This paper contains 50 sections, 3 equations, 6 figures, 10 tables, 2 algorithms.

Figures (6)

  • Figure 1: Illustration of ThoughtSculpt using Monte Carlo Tree Search on the Constrained Generation task. Each circle in the diagram represents a thought node generated by LLMs. Selection: choose a thought node $x$ based on a selection algorithm. Expansion: A new set of child nodes $X$ is generated using the initial instruction, the current node, and self-evaluated textual feedback. The zoom-in of the expansion phase demonstrates the use of the Thought Evaluator and the Thought Generator, which entails assessing and refining the current solution for the task \ref{['task3']}. Simulation: a single node $x'$ is randomly chosen from the set $X$. This selected node $x'$ generates further nodes in sequence for several steps, corresponding to our Decision Simulator. Backpropagation: The numerical feedback evaluated at the last node is propagated back to the root node.
  • Figure 2: Illustration of our Story Outline Improvement task. A step involves employing the thought evaluator to conduct itemized evaluations of the story outline and utilizing the thought generator to generate a candidate set of improved story outlines for task \ref{['task1']}.
  • Figure 3: Proportion of outlines generated by each method that were preferred by humans in pairwise comparison. ("Neither" indicates that neither ThoughtSculpt nor the baseline methods were preferred.)
  • Figure 4: Average outline interestingness at each step. ThoughtSculpt's interestingness increases more with steps compared to baselines.
  • Figure 5: Illustration of a step in the deliberation process in the Mini-Crosswords task, where the current crossword board is assessed using the thought evaluator and a candidate set of words is proposed for task \ref{['task2']}. One step is equal to one $d_{rollout}$
  • ...and 1 more figures