Chopping Trees: Semantic Similarity Based Dynamic Pruning for Tree-of-Thought Reasoning
Joongho Kim, Xirui Huang, Zarreen Reza, Gabriel Grand
TL;DR
The paper tackles the substantial computational cost of Tree-of-Thought reasoning by addressing semantic redundancy in the search space. It introduces Semantic Similarity-Based Dynamic Pruning (SSDP), which online merges semantically similar intermediate steps into hypernodes during inference within a Dynamic Parallel Tree Search framework. SSDP achieves large latency reductions (up to ~2.5×) and dramatically lowers explored nodes (85–90%) while maintaining accuracy across GSM8K and MATH500 with Llama-3 and Qwen models. The approach relies on a lightweight embedding-based semantic merging module and a single reward model, offering a practical pathway to scalable, real-time, deliberative reasoning in large language models.
Abstract
Tree-of-Thought (ToT) reasoning boosts the problem-solving abilities of Large Language Models (LLMs) but is computationally expensive due to semantic redundancy, where distinct branches explore equivalent reasoning paths. We introduce Semantic Similarity-Based Dynamic Pruning (SSDP), a lightweight method that, to the best of our knowledge, is the first framework to integrate online semantic merging into parallelized tree search, enabling the clustering and pruning of redundant steps in real time. Across reasoning benchmarks, including GSM8K and MATH500, SSDP achieves up to a 2.3x speedup over state-of-the-art tree-search baselines while maintaining competitive accuracy (typically within 5% of the strongest baseline) and reducing the number of explored nodes by 85-90%, demonstrating a practical approach to efficient, scalable LLM reasoning. The implementation of SSDP is publicly available at https://github.com/kimjoonghokim/SSDP.
