Table of Contents
Fetching ...

DTS: Enhancing Large Reasoning Models via Decoding Tree Sketching

Zicheng Xu, Guanchu Wang, Yu-Neng Chuang, Guangyao Zheng, Alexander S. Szalay, Zirui Liu, Vladimir Braverman

TL;DR

This work tackles overthinking in large reasoning models by revealing that shorter reasoning paths tend to be more correct. It introduces Decoding Tree Sketching (DTS), a training-free, model-agnostic decoding framework that selectively branches at high-entropy tokens and stops on the shortest completed path to approximate the optimal reasoning trajectory. DTS achieves up to 8% accuracy gains while reducing reasoning length by about 23% and lowers repetition frequencies across AIME2024 and AIME2025 with two DeepSeek-backed models, without additional supervision. The approach enables scalable, efficient reasoning by leveraging parallel decoding over a sketched tree, offering practical benefits for real-world inference where computation and latency matter.

Abstract

Large Reasoning Models (LRMs) demonstrate strong performance on complex reasoning tasks, yet they often suffer from overthinking, producing excessively long chain-of-thought (CoT) traces that increase inference cost and may degrade accuracy. Our analysis reveals a clear anti-correlation between reasoning length and accuracy, where across multiple stochastic decodes, the short reasoning paths consistently achieve the highest correctness, while longer ones accumulate errors and repetitions. These short optimal reasoning paths can be found ideally through full enumeration of the reasoning space. However, the tree-structured reasoning space grows exponentially with sequence length, rendering exhaustive exploration infeasible. To address this, we propose DTS, a model-agnostic decoding framework that sketches the reasoning space by selectively branching at high-entropy tokens and applies early stopping to select the shortest completed reasoning path. This approach approximates the optimal solution that enhances both efficiency and accuracy, without requiring additional training or supervision. Experiments on AIME2024 and AIME2025 datasets with DeepSeek-R1-Distill-Qwen-7B and 1.5B show that DTS improves accuracy by up to 8%, reduces average reasoning length by 23%, and decreases repetition frequency by 12%, demonstrating DTS's ability for scalable and efficient LRM reasoning.

DTS: Enhancing Large Reasoning Models via Decoding Tree Sketching

TL;DR

This work tackles overthinking in large reasoning models by revealing that shorter reasoning paths tend to be more correct. It introduces Decoding Tree Sketching (DTS), a training-free, model-agnostic decoding framework that selectively branches at high-entropy tokens and stops on the shortest completed path to approximate the optimal reasoning trajectory. DTS achieves up to 8% accuracy gains while reducing reasoning length by about 23% and lowers repetition frequencies across AIME2024 and AIME2025 with two DeepSeek-backed models, without additional supervision. The approach enables scalable, efficient reasoning by leveraging parallel decoding over a sketched tree, offering practical benefits for real-world inference where computation and latency matter.

Abstract

Large Reasoning Models (LRMs) demonstrate strong performance on complex reasoning tasks, yet they often suffer from overthinking, producing excessively long chain-of-thought (CoT) traces that increase inference cost and may degrade accuracy. Our analysis reveals a clear anti-correlation between reasoning length and accuracy, where across multiple stochastic decodes, the short reasoning paths consistently achieve the highest correctness, while longer ones accumulate errors and repetitions. These short optimal reasoning paths can be found ideally through full enumeration of the reasoning space. However, the tree-structured reasoning space grows exponentially with sequence length, rendering exhaustive exploration infeasible. To address this, we propose DTS, a model-agnostic decoding framework that sketches the reasoning space by selectively branching at high-entropy tokens and applies early stopping to select the shortest completed reasoning path. This approach approximates the optimal solution that enhances both efficiency and accuracy, without requiring additional training or supervision. Experiments on AIME2024 and AIME2025 datasets with DeepSeek-R1-Distill-Qwen-7B and 1.5B show that DTS improves accuracy by up to 8%, reduces average reasoning length by 23%, and decreases repetition frequency by 12%, demonstrating DTS's ability for scalable and efficient LRM reasoning.

Paper Structure

This paper contains 28 sections, 2 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: DTS effectively improves accuracy by $8\%$ and $7.3\%$, and reduce repetition rate by $5.3\%$ and $10\%$ on the AIME2024 and AIME2025, respectively.
  • Figure 2: (a) Accuracy vs. response length with a linear regression fit. Each dot represents a single inference run. (b) Generation of the decoding tree by DTS. DTS expands new branches whenever the next-token entropy satisfies $H(v) \geq \tau$.
  • Figure 3: An example of DTS decoding process, given the input prompt 'What’s the area of a rectangle with length 12 and width 9?'. DTS generates new branches at steps $t_1$ and $t_2$, and stops as soon as any branch terminates with an ending token. The final output is 'The area is length× width. Here, length =12 and width = 9. So area= 12× 9= 108'.
  • Figure 4: A case study illustrating how DTS produces a concise and correct solution, while standard inference overthinks and enters an endless repetition loop, failing to reach a conclusion after consuming the maximum 32,678 tokens.
  • Figure : Decoding Tree Sketching (DTS)