Table of Contents
Fetching ...

Concise Reasoning in the Lens of Lagrangian Optimization

Chengqian Gao, Haonan Li, Taylor W. Killian, Jianshu She, Renxi Wang, Liqun Ma, Zhoujun Cheng, Shibo Hao, Zhiqiang Xu

TL;DR

This work addresses the problem of overthinking in reasoning with large language models by introducing Performance-Aware Length Update (PALU), a principled method that treats concise reasoning as a constrained optimization problem. The core idea is to minimize the per-question generation length $L$ while maintaining a performance threshold $C$, reformulated via a Lagrangian $ abla$-based minimax objective: $\min_{\bm{\theta},L>0}\max_{\lambda\ge0}\mathcal{L}(\bm{\theta},L,\lambda) = L + \lambda (C - R(\bm{\theta},L,q))$. PALU then employs three pragmatic approximations—off-policy performance estimation, a two-regime budget controller, and a quantile-driven update—to realize an efficient, adaptive budgeting mechanism that scales across domains and model sizes. Empirically, PALU achieves a 65% reduction in generation length and a 15% improvement in accuracy on five benchmark tasks when applied to DeepSeek-R1-Distill-Qwen-1.5B, demonstrating strong adaptivity across mathematics, logic, and STEM domains and across model scales from 1.5B to 14B parameters. The work demonstrates that a principled optimization framework coupled with pragmatic approximations can deliver robust, domain- and scale-invariant concise reasoning, with practical impact on cost and user experience.

Abstract

Concise reasoning in large language models seeks to generate only essential intermediate steps needed to arrive at a final answer, thereby alleviating issues of overthinking. Most proposed approaches hinge on carefully hand-crafted heuristics, struggling to balance concision with performance, often failing to adapt across domains and model scales. In this work, we address these challenges by introducing a principled and pragmatic strategy, performance-aware length updating (PALU). As a principled algorithm, PALU formulates concise reasoning as a constrained optimization problem, minimizing response length subject to a performance constraint, and then applies Lagrangian optimization to convert it into a tractable unconstrained problem. As a pragmatic solution, PALU streamlines complicated update rules through three approximations: (i) estimating performance with off-policy rollouts, (ii) truncating the Lagrange multiplier to two extremes, and (iii) replacing gradient-based updates with quantile-driven length adjustments. PALU reduces output length by 65% while improving accuracy by 15% when applied to DeepSeek-Distill-Qwen-1.5B, averaged over five benchmarks, outperforming a range of alternative methods. Furthermore, PALU is demonstrated to adapt across both domain (logic, STEM and math) and model scale (1.5B, 7B, 14B) entrenching the algorithm as a practical and effective concise reasoning approach.

Concise Reasoning in the Lens of Lagrangian Optimization

TL;DR

This work addresses the problem of overthinking in reasoning with large language models by introducing Performance-Aware Length Update (PALU), a principled method that treats concise reasoning as a constrained optimization problem. The core idea is to minimize the per-question generation length while maintaining a performance threshold , reformulated via a Lagrangian -based minimax objective: . PALU then employs three pragmatic approximations—off-policy performance estimation, a two-regime budget controller, and a quantile-driven update—to realize an efficient, adaptive budgeting mechanism that scales across domains and model sizes. Empirically, PALU achieves a 65% reduction in generation length and a 15% improvement in accuracy on five benchmark tasks when applied to DeepSeek-R1-Distill-Qwen-1.5B, demonstrating strong adaptivity across mathematics, logic, and STEM domains and across model scales from 1.5B to 14B parameters. The work demonstrates that a principled optimization framework coupled with pragmatic approximations can deliver robust, domain- and scale-invariant concise reasoning, with practical impact on cost and user experience.

Abstract

Concise reasoning in large language models seeks to generate only essential intermediate steps needed to arrive at a final answer, thereby alleviating issues of overthinking. Most proposed approaches hinge on carefully hand-crafted heuristics, struggling to balance concision with performance, often failing to adapt across domains and model scales. In this work, we address these challenges by introducing a principled and pragmatic strategy, performance-aware length updating (PALU). As a principled algorithm, PALU formulates concise reasoning as a constrained optimization problem, minimizing response length subject to a performance constraint, and then applies Lagrangian optimization to convert it into a tractable unconstrained problem. As a pragmatic solution, PALU streamlines complicated update rules through three approximations: (i) estimating performance with off-policy rollouts, (ii) truncating the Lagrange multiplier to two extremes, and (iii) replacing gradient-based updates with quantile-driven length adjustments. PALU reduces output length by 65% while improving accuracy by 15% when applied to DeepSeek-Distill-Qwen-1.5B, averaged over five benchmarks, outperforming a range of alternative methods. Furthermore, PALU is demonstrated to adapt across both domain (logic, STEM and math) and model scale (1.5B, 7B, 14B) entrenching the algorithm as a practical and effective concise reasoning approach.

Paper Structure

This paper contains 41 sections, 15 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: Token-length distribution of correct rollouts from the DeepSeek-R1-Distill-Qwen series of reasoning models. Box plots indicate the range between the 25th and 75th percentiles.
  • Figure 2: Left: Performance-conciseness evolution of PALU. The evaluation dataset is AIME24. We encode their Spearman's correlations with red (negative) and green (positive) regions. Right: Distribution of generation lengths under PALU and ShorterBetter yi2025shorterbetter.
  • Figure 3: Conciseness-performance evolution of DeepSeek-R1-Distill-Qwen-1.5B trained with different concise reasoning methods. The training dataset covers three-domain questions: math, logic and STEM. Results are plotted with time weight exponential moving average smoothing.
  • Figure 4: Ablation on the step size $\alpha_{\tau}$.
  • Figure 5: Overthinking LLMs exhibit broad variation in the length of (correct) generations (Figure \ref{['fig:distribution-of-generation-length']}). Token-length distributions of correct responses from open-source reasoning LLMs (DeepSeek-R1-Distill-Qwen, Qwen3, and DeepSeek-R1-0528) on randomly selected $18$ questions from the Guru dataset. Box plots show the interquartile range (25th–75th percentiles).
  • ...and 3 more figures