Table of Contents
Fetching ...

A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings

Xiaoang Xu, Shuo Wang, Xu Han, Zhenghao Liu, Huijia Wu, Peipei Li, Zhiyuan Liu, Maosong Sun, Zhaofeng He

TL;DR

A*-Thought tackles the inefficiency of long Chain-of-Thought reasoning in Large Reasoning Models by introducing a bidirectional step-level importance score and a path-level A* search to identify compact, high-density reasoning trajectories. The method jointly ranks individual thoughts and assembles a concise reasoning path through a cost-guided search that balances current quality with estimated future information needs. Empirical results across math benchmarks show substantial gains in accuracy and efficiency, including up to 2.39x accuracy and notable reduction in output length, with demonstrated generalization across multiple LRMs. This approach enables more practical, budget-conscious reasoning and offers a foundation for extending to RL-based training and greener AI deployments.

Abstract

Large Reasoning Models (LRMs) achieve superior performance by extending the thought length. However, a lengthy thinking trajectory leads to reduced efficiency. Most of the existing methods are stuck in the assumption of overthinking and attempt to reason efficiently by compressing the Chain-of-Thought, but this often leads to performance degradation. To address this problem, we introduce A*-Thought, an efficient tree search-based unified framework designed to identify and isolate the most essential thoughts from the extensive reasoning chains produced by these models. It formulates the reasoning process of LRMs as a search tree, where each node represents a reasoning span in the giant reasoning space. By combining the A* search algorithm with a cost function specific to the reasoning path, it can efficiently compress the chain of thought and determine a reasoning path with high information density and low cost. In addition, we also propose a bidirectional importance estimation mechanism, which further refines this search process and enhances its efficiency beyond uniform sampling. Extensive experiments on several advanced math tasks show that A*-Thought effectively balances performance and efficiency over a huge search space. Specifically, A*-Thought can improve the performance of QwQ-32B by 2.39$\times$ with low-budget and reduce the length of the output token by nearly 50% with high-budget. The proposed method is also compatible with several other LRMs, demonstrating its generalization capability. The code can be accessed at: https://github.com/AI9Stars/AStar-Thought.

A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings

TL;DR

A*-Thought tackles the inefficiency of long Chain-of-Thought reasoning in Large Reasoning Models by introducing a bidirectional step-level importance score and a path-level A* search to identify compact, high-density reasoning trajectories. The method jointly ranks individual thoughts and assembles a concise reasoning path through a cost-guided search that balances current quality with estimated future information needs. Empirical results across math benchmarks show substantial gains in accuracy and efficiency, including up to 2.39x accuracy and notable reduction in output length, with demonstrated generalization across multiple LRMs. This approach enables more practical, budget-conscious reasoning and offers a foundation for extending to RL-based training and greener AI deployments.

Abstract

Large Reasoning Models (LRMs) achieve superior performance by extending the thought length. However, a lengthy thinking trajectory leads to reduced efficiency. Most of the existing methods are stuck in the assumption of overthinking and attempt to reason efficiently by compressing the Chain-of-Thought, but this often leads to performance degradation. To address this problem, we introduce A*-Thought, an efficient tree search-based unified framework designed to identify and isolate the most essential thoughts from the extensive reasoning chains produced by these models. It formulates the reasoning process of LRMs as a search tree, where each node represents a reasoning span in the giant reasoning space. By combining the A* search algorithm with a cost function specific to the reasoning path, it can efficiently compress the chain of thought and determine a reasoning path with high information density and low cost. In addition, we also propose a bidirectional importance estimation mechanism, which further refines this search process and enhances its efficiency beyond uniform sampling. Extensive experiments on several advanced math tasks show that A*-Thought effectively balances performance and efficiency over a huge search space. Specifically, A*-Thought can improve the performance of QwQ-32B by 2.39 with low-budget and reduce the length of the output token by nearly 50% with high-budget. The proposed method is also compatible with several other LRMs, demonstrating its generalization capability. The code can be accessed at: https://github.com/AI9Stars/AStar-Thought.

Paper Structure

This paper contains 41 sections, 10 equations, 12 figures, 11 tables, 1 algorithm.

Figures (12)

  • Figure 1: Illustration of the comparison between the standard CoT and the proposed A*-Thought. In A*-Thought, each thinking step is assigned a bidirectional importance score (BIS), represented by varying color shades. Guided by the carefully-designed cost functions, A*-Thought efficiently arrives at the solution using fewer steps, reducing the redundancy inherent in the original CoT.
  • Figure 2: Illustration of A*-Thought, a long-CoT compression method. A*-Thought leverages signals at both the step and path levels. At the step-level, a bidirectional importance score assesses relevance to both the question and the solution. At the path-level, an A* search algorithm is employed, with cost functions designed to consider both current path quality and estimated future cost.
  • Figure 3: Distribution of BIS values for individual thinking steps in Long CoT.
  • Figure 4: ACU on different methods, which reflects performance-to-efficiency ratio of LRMs.
  • Figure 5: Performance of R1-Distill-32B augmented using TokenSkip and A*-Thought. "Average" denotes the average accuracy of the model in MATH500, AMC23, OlympiadBench, and GSM8K.
  • ...and 7 more figures