Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls

Ante Wang; Linfeng Song; Ye Tian; Dian Yu; Haitao Mi; Xiangyu Duan; Zhaopeng Tu; Jinsong Su; Dong Yu

Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls

Ante Wang, Linfeng Song, Ye Tian, Dian Yu, Haitao Mi, Xiangyu Duan, Zhaopeng Tu, Jinsong Su, Dong Yu

TL;DR

This work tackles inefficiencies in LLM reasoning that arise when tree search is guided by verifiers, identifying over-exploration from redundant states and under-exploration from high score variance. It introduces FETCH, a plug-in framework that combines semantic state merging via agglomerative clustering of embeddings (post-processed with signals from prompting or consistency checks) with variance reduction techniques: TD($\lambda$) training for verifiers and ensemble scoring at inference. Empirically, FETCH reduces token costs and boosts accuracy across BFS, Beam Search, Tree Search, and MCTS on GSM8K, GSM-Plus, and MATH datasets, with state merging cutting costs by up to ~3x in some cases and variance reduction providing consistent 1–2 point gains. The results underscore FETCH’s potential to make sophisticated LLM-based reasoning more efficient and scalable, enabling broader practical deployment of guided tree-search methods in complex domains.

Abstract

Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs), but at the cost of increased computational resources. In this work, we identify two key challenges contributing to this inefficiency: $\textit{over-exploration}$ due to redundant states with semantically equivalent content, and $\textit{under-exploration}$ caused by high variance in verifier scoring leading to frequent trajectory switching. To address these issues, we propose FETCH, an e$\textbf{f}$fici$\textbf{e}$nt $\textbf{t}$ree sear$\textbf{ch}$ framework, which is a flexible, plug-and-play system compatible with various tree search algorithms. Our framework mitigates over-exploration by merging semantically similar states using agglomerative clustering of text embeddings obtained from a fine-tuned SimCSE model. To tackle under-exploration, we enhance verifiers by incorporating temporal difference learning with adjusted $λ$-returns during training to reduce variance, and employing a verifier ensemble to aggregate scores during inference. Experiments on GSM8K, GSM-Plus, and MATH datasets demonstrate that our methods significantly improve reasoning accuracy and computational efficiency across four different tree search algorithms, paving the way for more practical applications of LLM-based reasoning. The code is available at https://github.com/Soistesimmer/Fetch.

Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls

TL;DR

) training for verifiers and ensemble scoring at inference. Empirically, FETCH reduces token costs and boosts accuracy across BFS, Beam Search, Tree Search, and MCTS on GSM8K, GSM-Plus, and MATH datasets, with state merging cutting costs by up to ~3x in some cases and variance reduction providing consistent 1–2 point gains. The results underscore FETCH’s potential to make sophisticated LLM-based reasoning more efficient and scalable, enabling broader practical deployment of guided tree-search methods in complex domains.

Abstract

due to redundant states with semantically equivalent content, and

caused by high variance in verifier scoring leading to frequent trajectory switching. To address these issues, we propose FETCH, an e

fici

ree sear

framework, which is a flexible, plug-and-play system compatible with various tree search algorithms. Our framework mitigates over-exploration by merging semantically similar states using agglomerative clustering of text embeddings obtained from a fine-tuned SimCSE model. To tackle under-exploration, we enhance verifiers by incorporating temporal difference learning with adjusted

-returns during training to reduce variance, and employing a verifier ensemble to aggregate scores during inference. Experiments on GSM8K, GSM-Plus, and MATH datasets demonstrate that our methods significantly improve reasoning accuracy and computational efficiency across four different tree search algorithms, paving the way for more practical applications of LLM-based reasoning. The code is available at https://github.com/Soistesimmer/Fetch.

Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls

TL;DR

Abstract

Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)