Table of Contents
Fetching ...

Chopping Trees: Semantic Similarity Based Dynamic Pruning for Tree-of-Thought Reasoning

Joongho Kim, Xirui Huang, Zarreen Reza, Gabriel Grand

TL;DR

The paper tackles the substantial computational cost of Tree-of-Thought reasoning by addressing semantic redundancy in the search space. It introduces Semantic Similarity-Based Dynamic Pruning (SSDP), which online merges semantically similar intermediate steps into hypernodes during inference within a Dynamic Parallel Tree Search framework. SSDP achieves large latency reductions (up to ~2.5×) and dramatically lowers explored nodes (85–90%) while maintaining accuracy across GSM8K and MATH500 with Llama-3 and Qwen models. The approach relies on a lightweight embedding-based semantic merging module and a single reward model, offering a practical pathway to scalable, real-time, deliberative reasoning in large language models.

Abstract

Tree-of-Thought (ToT) reasoning boosts the problem-solving abilities of Large Language Models (LLMs) but is computationally expensive due to semantic redundancy, where distinct branches explore equivalent reasoning paths. We introduce Semantic Similarity-Based Dynamic Pruning (SSDP), a lightweight method that, to the best of our knowledge, is the first framework to integrate online semantic merging into parallelized tree search, enabling the clustering and pruning of redundant steps in real time. Across reasoning benchmarks, including GSM8K and MATH500, SSDP achieves up to a 2.3x speedup over state-of-the-art tree-search baselines while maintaining competitive accuracy (typically within 5% of the strongest baseline) and reducing the number of explored nodes by 85-90%, demonstrating a practical approach to efficient, scalable LLM reasoning. The implementation of SSDP is publicly available at https://github.com/kimjoonghokim/SSDP.

Chopping Trees: Semantic Similarity Based Dynamic Pruning for Tree-of-Thought Reasoning

TL;DR

The paper tackles the substantial computational cost of Tree-of-Thought reasoning by addressing semantic redundancy in the search space. It introduces Semantic Similarity-Based Dynamic Pruning (SSDP), which online merges semantically similar intermediate steps into hypernodes during inference within a Dynamic Parallel Tree Search framework. SSDP achieves large latency reductions (up to ~2.5×) and dramatically lowers explored nodes (85–90%) while maintaining accuracy across GSM8K and MATH500 with Llama-3 and Qwen models. The approach relies on a lightweight embedding-based semantic merging module and a single reward model, offering a practical pathway to scalable, real-time, deliberative reasoning in large language models.

Abstract

Tree-of-Thought (ToT) reasoning boosts the problem-solving abilities of Large Language Models (LLMs) but is computationally expensive due to semantic redundancy, where distinct branches explore equivalent reasoning paths. We introduce Semantic Similarity-Based Dynamic Pruning (SSDP), a lightweight method that, to the best of our knowledge, is the first framework to integrate online semantic merging into parallelized tree search, enabling the clustering and pruning of redundant steps in real time. Across reasoning benchmarks, including GSM8K and MATH500, SSDP achieves up to a 2.3x speedup over state-of-the-art tree-search baselines while maintaining competitive accuracy (typically within 5% of the strongest baseline) and reducing the number of explored nodes by 85-90%, demonstrating a practical approach to efficient, scalable LLM reasoning. The implementation of SSDP is publicly available at https://github.com/kimjoonghokim/SSDP.

Paper Structure

This paper contains 30 sections, 1 equation, 3 figures, 6 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of SSDP. (a) Flowchart of the SSDP algorithm showing its iterative expand–score–prune–insert loop. (b) Example search tree where semantically similar nodes are merged into representative clusters, reducing redundant exploration.
  • Figure 2: Pareto Front Analysis on Similarity Threshold $\tau$
  • Figure :