Table of Contents
Fetching ...

Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models

Qiguang Chen, Dengyun Peng, Jinhao Liu, HuiKang Su, Jiannan Guan, Libo Qin, Wanxiang Che

TL;DR

<p>Many LLMs suffer from inefficiency when using long chain-of-thought due to token redundancy, especially when relying on static human priors for problem difficulty. The paper introduces DR. SAF, the Dynamic Reasoning-Boundary Self-Awareness Framework, which integrates Boundary Self-Awareness Alignment, Adaptive Length Management, and Boundary Preservation Mechanism within a Group Relative Policy Optimization (GRPO) scheme to adapt reasoning depth to real-time task difficulty. Empirical results across six benchmarks show DR. SAF achieves about a 49.27% reduction in total response tokens with minimal accuracy loss, a 6.59x improvement in token efficiency, and a 5x reduction in training time, with even larger gains on stronger LLMs and during extreme compression. This self-aware, boundary-guided approach enables more efficient and scalable reasoning in resource-constrained settings, while preserving or enhancing performance.</p>

Abstract

Recent advancements in large language models (LLMs) have greatly improved their capabilities on complex reasoning tasks through Long Chain-of-Thought (CoT). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-time applications. To improve the efficiency, current methods often rely on human-defined difficulty priors, which do not align with the LLM's self-awared difficulty, leading to inefficiencies. In this paper, we introduce the Dynamic Reasoning-Boundary Self-Awareness Framework (DR. SAF), which enables models to dynamically assess and adjust their reasoning depth in response to problem complexity. DR. SAF integrates three key components: Boundary Self-Awareness Alignment, Adaptive Reward Management, and a Boundary Preservation Mechanism. These components allow models to optimize their reasoning processes, balancing efficiency and accuracy without compromising performance. Our experimental results demonstrate that DR. SAF achieves a 49.27% reduction in total response tokens with minimal loss in accuracy. The framework also delivers a 6.59x gain in token efficiency and a 5x reduction in training time, making it well-suited to resource-limited settings. During extreme training, DR. SAF can even surpass traditional instruction-based models in token efficiency with more than 16% accuracy improvement.

Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models

TL;DR

<p>Many LLMs suffer from inefficiency when using long chain-of-thought due to token redundancy, especially when relying on static human priors for problem difficulty. The paper introduces DR. SAF, the Dynamic Reasoning-Boundary Self-Awareness Framework, which integrates Boundary Self-Awareness Alignment, Adaptive Length Management, and Boundary Preservation Mechanism within a Group Relative Policy Optimization (GRPO) scheme to adapt reasoning depth to real-time task difficulty. Empirical results across six benchmarks show DR. SAF achieves about a 49.27% reduction in total response tokens with minimal accuracy loss, a 6.59x improvement in token efficiency, and a 5x reduction in training time, with even larger gains on stronger LLMs and during extreme compression. This self-aware, boundary-guided approach enables more efficient and scalable reasoning in resource-constrained settings, while preserving or enhancing performance.</p>

Abstract

Recent advancements in large language models (LLMs) have greatly improved their capabilities on complex reasoning tasks through Long Chain-of-Thought (CoT). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-time applications. To improve the efficiency, current methods often rely on human-defined difficulty priors, which do not align with the LLM's self-awared difficulty, leading to inefficiencies. In this paper, we introduce the Dynamic Reasoning-Boundary Self-Awareness Framework (DR. SAF), which enables models to dynamically assess and adjust their reasoning depth in response to problem complexity. DR. SAF integrates three key components: Boundary Self-Awareness Alignment, Adaptive Reward Management, and a Boundary Preservation Mechanism. These components allow models to optimize their reasoning processes, balancing efficiency and accuracy without compromising performance. Our experimental results demonstrate that DR. SAF achieves a 49.27% reduction in total response tokens with minimal loss in accuracy. The framework also delivers a 6.59x gain in token efficiency and a 5x reduction in training time, making it well-suited to resource-limited settings. During extreme training, DR. SAF can even surpass traditional instruction-based models in token efficiency with more than 16% accuracy improvement.

Paper Structure

This paper contains 48 sections, 22 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Traditional efficient reasoning training methods (a) primarily determine the difficulty of questions based on human-defined priors, while our dynamic reasoning boundary self-awareness framework (b) judged the difficulty of questions based on model self-awared reasoning boundary.
  • Figure 2: Main pipeline of Dynamic Reasoning-Boundary Self-Awareness Framework (DR. SAF), including Boundary Self-Awareness Alignment (BSA), Adaptive Length Management (ALM), and Boundary Preservation Mechanism (BPM).
  • Figure 3: Training trajectory of BSA, shown as the predicted CFRB ratio plotted against the training steps.
  • Figure 4: Comparing the extreme efficiency of DR. SAF (DR. SAF-Ext) with traditional instruction models and current SOTA reasoning efficient techniques.
  • Figure 5: Training efficiency comparison of DR. SAF vs. FEDH on R1-Distill-Qwen-3-8B.
  • ...and 3 more figures