Table of Contents
Fetching ...

Not All Errors Are Created Equal: ASCoT Addresses Late-Stage Fragility in Efficient LLM Reasoning

Dongxu Zhang, Ning Yang, Yiding Sun, Jihua Zhu, Jinnan Yang, Miao Xin, Baoliang Tian

TL;DR

ASCoT (Adaptive Self-Correction Chain-of-Thought), a method harmonizing efficiency with robust verification, is introduced, achieving a superior trade-off between inference efficiency and reasoning fidelity.

Abstract

While Chain-of-Thought (CoT) prompting empowers Large Language Models (LLMs), ensuring reasoning reliability remains an open challenge. Contrary to the prevailing cascading failure hypothesis which posits that early errors are most detrimental, we identify a counter-intuitive phenomenon termed \textbf{Late-Stage Fragility}: errors introduced in later reasoning stages are significantly more prone to corrupting final answers. To address this, we introduce ASCoT (Adaptive Self-Correction Chain-of-Thought), a method harmonizing efficiency with robust verification. ASCoT first employs semantic pruning to compress redundant steps, then utilizes an Adaptive Verification Manager (AVM) to prioritize high risk, late-stage steps via a positional impact score, triggering a Multi-Perspective Self-Correction Engine (MSCE) only when necessary. Experiments on GSM8K and MATH-500 demonstrate that ASCoT effectively reallocates computational resources: it reduces token usage by 21\%--30\% for LLaMA-3.1-8B with negligible accuracy drops ($<1.8\%$), achieving a superior trade-off between inference efficiency and reasoning fidelity.

Not All Errors Are Created Equal: ASCoT Addresses Late-Stage Fragility in Efficient LLM Reasoning

TL;DR

ASCoT (Adaptive Self-Correction Chain-of-Thought), a method harmonizing efficiency with robust verification, is introduced, achieving a superior trade-off between inference efficiency and reasoning fidelity.

Abstract

While Chain-of-Thought (CoT) prompting empowers Large Language Models (LLMs), ensuring reasoning reliability remains an open challenge. Contrary to the prevailing cascading failure hypothesis which posits that early errors are most detrimental, we identify a counter-intuitive phenomenon termed \textbf{Late-Stage Fragility}: errors introduced in later reasoning stages are significantly more prone to corrupting final answers. To address this, we introduce ASCoT (Adaptive Self-Correction Chain-of-Thought), a method harmonizing efficiency with robust verification. ASCoT first employs semantic pruning to compress redundant steps, then utilizes an Adaptive Verification Manager (AVM) to prioritize high risk, late-stage steps via a positional impact score, triggering a Multi-Perspective Self-Correction Engine (MSCE) only when necessary. Experiments on GSM8K and MATH-500 demonstrate that ASCoT effectively reallocates computational resources: it reduces token usage by 21\%--30\% for LLaMA-3.1-8B with negligible accuracy drops (), achieving a superior trade-off between inference efficiency and reasoning fidelity.

Paper Structure

This paper contains 30 sections, 9 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: The top part (a) illustrates the standard CoT reasoning process, which often generates redundant outputs and lacks an effective self-correction mechanism when errors occur. The bottom part (b) depicts the ASCoT method, which incorporates a robust mechanism for self-correction, addressing potential reasoning flaws at each stage.
  • Figure 2: Illustration of ASCoT Pipeline. ASCoT generates the $CoT_{initial}$ from the target LLM, which is then compressed by IRM module to a predefined ratio $\gamma$. The compressed CoT is verified by AVM module, and if the confidence score exceeds a threshold $\tau$, the problematic steps are sent to MSCE module for error correction. The corrected $CoT_{final}$ is then used for fine-tuning. Finally, ASCoT enables reasoning on new problems with the specified compression ratio $\gamma$.
  • Figure 3: The details of verification and self-correction. AVM and MSCE working together to correct a faulty step $t_k$. AVM computes $R(t_k)$. MSCE then applies intrinsic and extrinsic correction to produce the final corrected $CoT_{final}$.
  • Figure 4: Performance of ASCoT on the Qwen2.5-Instruct Series. The results for the 3B, 7B, and 14B models are shown under various compression ratios and compared against the original, uncompressed baseline.
  • Figure 5: Performance comparison of ASCoT with varying maximum length constraints. We adjust the maximum length budget when evaluating our method on MATH-500.
  • ...and 2 more figures