Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models

Qiguang Chen; Dengyun Peng; Jinhao Liu; HuiKang Su; Jiannan Guan; Libo Qin; Wanxiang Che

Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models

Qiguang Chen, Dengyun Peng, Jinhao Liu, HuiKang Su, Jiannan Guan, Libo Qin, Wanxiang Che

TL;DR

<p>Many LLMs suffer from inefficiency when using long chain-of-thought due to token redundancy, especially when relying on static human priors for problem difficulty. The paper introduces DR. SAF, the Dynamic Reasoning-Boundary Self-Awareness Framework, which integrates Boundary Self-Awareness Alignment, Adaptive Length Management, and Boundary Preservation Mechanism within a Group Relative Policy Optimization (GRPO) scheme to adapt reasoning depth to real-time task difficulty. Empirical results across six benchmarks show DR. SAF achieves about a 49.27% reduction in total response tokens with minimal accuracy loss, a 6.59x improvement in token efficiency, and a 5x reduction in training time, with even larger gains on stronger LLMs and during extreme compression. This self-aware, boundary-guided approach enables more efficient and scalable reasoning in resource-constrained settings, while preserving or enhancing performance.</p>

Abstract

Recent advancements in large language models (LLMs) have greatly improved their capabilities on complex reasoning tasks through Long Chain-of-Thought (CoT). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-time applications. To improve the efficiency, current methods often rely on human-defined difficulty priors, which do not align with the LLM's self-awared difficulty, leading to inefficiencies. In this paper, we introduce the Dynamic Reasoning-Boundary Self-Awareness Framework (DR. SAF), which enables models to dynamically assess and adjust their reasoning depth in response to problem complexity. DR. SAF integrates three key components: Boundary Self-Awareness Alignment, Adaptive Reward Management, and a Boundary Preservation Mechanism. These components allow models to optimize their reasoning processes, balancing efficiency and accuracy without compromising performance. Our experimental results demonstrate that DR. SAF achieves a 49.27% reduction in total response tokens with minimal loss in accuracy. The framework also delivers a 6.59x gain in token efficiency and a 5x reduction in training time, making it well-suited to resource-limited settings. During extreme training, DR. SAF can even surpass traditional instruction-based models in token efficiency with more than 16% accuracy improvement.

Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models

TL;DR

Abstract

Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)