Domain-Specialized Tree of Thought through Plug-and-Play Predictors

Xuanqi Gao; Haoyu Wang; Jun Sun; Shiqing Ma; Chao Shen

Domain-Specialized Tree of Thought through Plug-and-Play Predictors

Xuanqi Gao, Haoyu Wang, Jun Sun, Shiqing Ma, Chao Shen

Abstract

While Large Language Models (LLMs) have advanced complex reasoning, prominent methods like the Tree of Thoughts (ToT) framework face a critical trade-off between exploration depth and computational efficiency. Existing ToT implementations often rely on heavyweight LLM-based self-evaluation or rigid heuristics for branch pruning, making them prohibitively expensive and inflexible for broad application. To address this, we introduce DST, an adaptable, plug-and-play predictor that serves as a lightweight, supervised heuristic to guide the ToT search process. Our predictor enables dynamic, context-aware pruning, allowing the search to proceed with near-greedy efficiency on simpler reasoning steps while adaptively expanding the search beam only when encountering uncertainty or task complexity. We evaluate our approach on a diverse suite of benchmarks spanning mathematical reasoning, general reasoning, and complex logical reasoning. Experimental results demonstrate that our method achieves accuracy competitive with or superior to strong baselines, including standard ToT, while reducing computational overhead by 26-75%. Our work effectively resolves the accuracy-efficiency trade-off in tree-based reasoning, transforming ToT from a resource-intensive technique into a scalable and practical paradigm for complex problem-solving in LLMs.

Domain-Specialized Tree of Thought through Plug-and-Play Predictors

Abstract

Paper Structure (38 sections, 4 equations, 5 figures, 7 tables, 2 algorithms)

This paper contains 38 sections, 4 equations, 5 figures, 7 tables, 2 algorithms.

Introduction
Background
Example.
Relationship to Graph-of-Thoughts.
Method
State Definition
Training Domain-Specialized Predictor
Data collection.
Training vs. Inference Requirements.
Predictor as Runtime Evaluator
Complexity analysis of pruning.
Experiment
Main Result
Experimental Setup.
Efficiency-Accuracy Trade-off Achievement.
...and 23 more sections

Figures (5)

Figure 1: Overview of DST.
Figure 2: Accuracy vs. Efficiency Trade-off. Each point represents the performance of a method on a specific task and model, plotted as accuracy gain (percentage points) versus efficiency gain (percentage cost reduction) relative to CoT.
Figure 3: Accuracy vs. Average Tokens as a function of Beam Width ($b$) on BBEH-BoardgameQA. The red dot marks our default setting of $b=3$, which offers a strong balance between performance and cost.
Figure 4: Top: Accuracy vs. Average Tokens as a function of Pruning Threshold ($\tau$) on GSM8K (left) and BBEH-BoardgameQA (right). Bottom: The corresponding Shortcut Rate for each dataset.
Figure 5: Accuracy vs. Average Tokens as a function of Discount Factor ($\gamma$) on GSM8K (left) and GPQA (right). The red marker indicates the optimal performance point, achieved at $\gamma=0.99$.

Domain-Specialized Tree of Thought through Plug-and-Play Predictors

Abstract

Domain-Specialized Tree of Thought through Plug-and-Play Predictors

Authors

Abstract

Table of Contents

Figures (5)