Concise Reasoning in the Lens of Lagrangian Optimization
Chengqian Gao, Haonan Li, Taylor W. Killian, Jianshu She, Renxi Wang, Liqun Ma, Zhoujun Cheng, Shibo Hao, Zhiqiang Xu
TL;DR
This work addresses the problem of overthinking in reasoning with large language models by introducing Performance-Aware Length Update (PALU), a principled method that treats concise reasoning as a constrained optimization problem. The core idea is to minimize the per-question generation length $L$ while maintaining a performance threshold $C$, reformulated via a Lagrangian $ abla$-based minimax objective: $\min_{\bm{\theta},L>0}\max_{\lambda\ge0}\mathcal{L}(\bm{\theta},L,\lambda) = L + \lambda (C - R(\bm{\theta},L,q))$. PALU then employs three pragmatic approximations—off-policy performance estimation, a two-regime budget controller, and a quantile-driven update—to realize an efficient, adaptive budgeting mechanism that scales across domains and model sizes. Empirically, PALU achieves a 65% reduction in generation length and a 15% improvement in accuracy on five benchmark tasks when applied to DeepSeek-R1-Distill-Qwen-1.5B, demonstrating strong adaptivity across mathematics, logic, and STEM domains and across model scales from 1.5B to 14B parameters. The work demonstrates that a principled optimization framework coupled with pragmatic approximations can deliver robust, domain- and scale-invariant concise reasoning, with practical impact on cost and user experience.
Abstract
Concise reasoning in large language models seeks to generate only essential intermediate steps needed to arrive at a final answer, thereby alleviating issues of overthinking. Most proposed approaches hinge on carefully hand-crafted heuristics, struggling to balance concision with performance, often failing to adapt across domains and model scales. In this work, we address these challenges by introducing a principled and pragmatic strategy, performance-aware length updating (PALU). As a principled algorithm, PALU formulates concise reasoning as a constrained optimization problem, minimizing response length subject to a performance constraint, and then applies Lagrangian optimization to convert it into a tractable unconstrained problem. As a pragmatic solution, PALU streamlines complicated update rules through three approximations: (i) estimating performance with off-policy rollouts, (ii) truncating the Lagrange multiplier to two extremes, and (iii) replacing gradient-based updates with quantile-driven length adjustments. PALU reduces output length by 65% while improving accuracy by 15% when applied to DeepSeek-Distill-Qwen-1.5B, averaged over five benchmarks, outperforming a range of alternative methods. Furthermore, PALU is demonstrated to adapt across both domain (logic, STEM and math) and model scale (1.5B, 7B, 14B) entrenching the algorithm as a practical and effective concise reasoning approach.
