Table of Contents
Fetching ...

EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning

Jiawei Liu, Qisi Chen, Jianshu Zhang, Quan Liu, Defu Lian

TL;DR

The paper addresses inefficiency in LLM-based search caused by exploring semantically equivalent reasoning steps, especially in mathematics. It introduces EquivPruner, a lightweight action-pruning module trained on MathEquiv to prune redundant branches in MCTS and related search, and demonstrates significant token savings with maintained or improved accuracy on GSM8K and MATH. The authors also release MathEquiv, a dataset for mathematical statement equivalence, enabling training of the pruner and evaluation of domain-specific equivalence detection. The results show that the approach generalizes to OOD models and datasets, offering practical improvements in inference-time efficiency for math reasoning tasks.

Abstract

Large Language Models (LLMs) excel at complex reasoning through search algorithms, yet current strategies often suffer from massive token consumption due to redundant exploration of semantically equivalent steps. Existing semantic similarity methods struggle to accurately identify such equivalence in domain-specific contexts like mathematical reasoning. To address this, we propose EquivPruner, a simple yet effective approach that identifies and prunes semantically equivalent actions during LLM reasoning search. We also introduce MathEquiv, the first dataset we created for mathematical statement equivalence, which enables the training of a lightweight equivalence detector. Extensive experiments across various models and tasks demonstrate that EquivPruner significantly reduces token consumption, improving searching efficiency and often bolstering reasoning accuracy. For instance, when applied to Qwen2.5-Math-7B-Instruct on GSM8K, EquivPruner reduced token consumption by 48.1\% while also improving accuracy. Our code is available at https://github.com/Lolo1222/EquivPruner.

EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning

TL;DR

The paper addresses inefficiency in LLM-based search caused by exploring semantically equivalent reasoning steps, especially in mathematics. It introduces EquivPruner, a lightweight action-pruning module trained on MathEquiv to prune redundant branches in MCTS and related search, and demonstrates significant token savings with maintained or improved accuracy on GSM8K and MATH. The authors also release MathEquiv, a dataset for mathematical statement equivalence, enabling training of the pruner and evaluation of domain-specific equivalence detection. The results show that the approach generalizes to OOD models and datasets, offering practical improvements in inference-time efficiency for math reasoning tasks.

Abstract

Large Language Models (LLMs) excel at complex reasoning through search algorithms, yet current strategies often suffer from massive token consumption due to redundant exploration of semantically equivalent steps. Existing semantic similarity methods struggle to accurately identify such equivalence in domain-specific contexts like mathematical reasoning. To address this, we propose EquivPruner, a simple yet effective approach that identifies and prunes semantically equivalent actions during LLM reasoning search. We also introduce MathEquiv, the first dataset we created for mathematical statement equivalence, which enables the training of a lightweight equivalence detector. Extensive experiments across various models and tasks demonstrate that EquivPruner significantly reduces token consumption, improving searching efficiency and often bolstering reasoning accuracy. For instance, when applied to Qwen2.5-Math-7B-Instruct on GSM8K, EquivPruner reduced token consumption by 48.1\% while also improving accuracy. Our code is available at https://github.com/Lolo1222/EquivPruner.

Paper Structure

This paper contains 29 sections, 4 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Illustration of the mathematical statement equivalence challenge during reasoning search. Given multiple candidate steps generated by an LLM, standard methods like embedding similarity or Levenshtein Ratio may incorrectly assess candidate 1 and candidate 2 as highly similar due to surface features, while failing to recognize the true semantic equivalence between candidate 2 and candidate 3, which represent the identical logical operation.
  • Figure 2: The EquivPruner framework. Top: Training the lightweight equivalence pruner from labeled step-level sentence pairs. Bottom: Applying the trained lightweight pruner during tree-search-based LLM inference to remove semantically equivalent candidates generated by the LLM.
  • Figure 3: Ablation study of EquivPruner components. The plot illustrates the impact of different pruning strategies within a MCTS framework on Token Consumption (bars, left y-axis) and Accuracy (line, right y-axis).
  • Figure 4: Complete prompt for labeling.