Table of Contents
Fetching ...

Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning

Shangziqi Zhao, Jiahao Yuan, Jinyang Wu, Zhenglin Wang, Guisong Yang, Usman Naseem

TL;DR

Long-CoT reasoning improves accuracy but scales poorly for distillation to SLMs. The authors introduce Prune-on-Logic, a structure-aware framework that converts Long-CoT traces into logic graphs and prunes low-utility steps under self-verification constraints. Across two distilled LLMs, three pruning strategies, and multiple benchmarks, verification pruning consistently boosts accuracy while reducing token usage, whereas indiscriminate or excessive pruning degrades performance; gains scale with model capacity. This work demonstrates that aligning supervision with model capability through graph-based pruning offers a practical path to scalable, high-quality long-context reasoning distillation across math and non-math tasks.

Abstract

Long chain-of-thought (Long-CoT) reasoning improves accuracy in LLMs, yet its verbose, self-reflective style often hinders effective distillation into small language models (SLMs). We revisit Long-CoT compression through the lens of capability alignment and ask: Can pruning improve reasoning? We propose Prune-on-Logic, a structure-aware framework that transforms Long-CoT into logic graphs and selectively prunes low-utility reasoning steps under self-verification constraints. Through systematic analysis across three pruning strategies targeting entire chains, core reasoning, and verification, we find that verification pruning consistently improves accuracy while reducing token usage, whereas pruning reasoning steps or indiscriminate pruning degrades performance. Our study reveals that effective pruning aligns supervision with model capacity rather than merely shortening inputs. Gains hold across tasks, model scales, and CoT capability, with larger models benefiting more from pruning due to richer but more redundant reasoning. Our empirical findings highlight pruning as a structural optimization strategy for aligning CoT reasoning with SLM capacity.

Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning

TL;DR

Long-CoT reasoning improves accuracy but scales poorly for distillation to SLMs. The authors introduce Prune-on-Logic, a structure-aware framework that converts Long-CoT traces into logic graphs and prunes low-utility steps under self-verification constraints. Across two distilled LLMs, three pruning strategies, and multiple benchmarks, verification pruning consistently boosts accuracy while reducing token usage, whereas indiscriminate or excessive pruning degrades performance; gains scale with model capacity. This work demonstrates that aligning supervision with model capability through graph-based pruning offers a practical path to scalable, high-quality long-context reasoning distillation across math and non-math tasks.

Abstract

Long chain-of-thought (Long-CoT) reasoning improves accuracy in LLMs, yet its verbose, self-reflective style often hinders effective distillation into small language models (SLMs). We revisit Long-CoT compression through the lens of capability alignment and ask: Can pruning improve reasoning? We propose Prune-on-Logic, a structure-aware framework that transforms Long-CoT into logic graphs and selectively prunes low-utility reasoning steps under self-verification constraints. Through systematic analysis across three pruning strategies targeting entire chains, core reasoning, and verification, we find that verification pruning consistently improves accuracy while reducing token usage, whereas pruning reasoning steps or indiscriminate pruning degrades performance. Our study reveals that effective pruning aligns supervision with model capacity rather than merely shortening inputs. Gains hold across tasks, model scales, and CoT capability, with larger models benefiting more from pruning due to richer but more redundant reasoning. Our empirical findings highlight pruning as a structural optimization strategy for aligning CoT reasoning with SLM capacity.

Paper Structure

This paper contains 37 sections, 15 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: Comparison of token-level vs. logic-based CoT compression. (A) Long2Short trims surface tokens xia2025tokenskiphou2025thinkprune, risking semantic loss. (B) Prune-on-Logic preserves deductive structure via equivalence-based graph pruning.
  • Figure 2: Overview of the Prune-on-Logic framework. Given a Long-CoT sequence, we build a logic graph that captures deductive dependencies and, through self-pruning under logical equivalence constraints, scores and eliminates low-impact reasoning steps. We explore three pruning strategies—All-chain, Reasoning-only, and Verification-only—and fine-tune SLMs on the compressed CoTs to enhance both efficiency and reasoning robustness.
  • Figure 3: CommonsenseQA: Verification Pruning on R1-Distill-Qwen-7B.
  • Figure 4: All-Verification pruning enhances accuracy while reducing token usage. (a): Models with (R1-Distill-Llama-8B) and without (Llama3.1) Long-CoT capability both improve, showing generality across reasoning supervision. (b): Larger R1-Distilled-Qwen models (7B) gain more from pruning than smaller ones (1.5B), highlighting benefits scale with capacity.
  • Figure 5: Logic Unit Segmentation Prompt. ($P_{logic}$)
  • ...and 1 more figures