Table of Contents
Fetching ...

TRiMS: Real-Time Tracking of Minimal Sufficient Length for Efficient Reasoning via RL

Tingcheng Bian, Jinchang Luo, Mingquan Cheng, Jinyu Zhang, Xiaoling Xia, Ni Li, Yan Tao, Haiwei Wang

Abstract

Large language models achieve breakthroughs in complex reasoning via long chain-of-thought sequences. However, this often leads to severe reasoning inflation, causing substantial computational redundancy. To maximize Intelligence per Token, we introduce a theoretical metric, MSL-Minimal Sufficient Length. MSL rigorously characterizes the shortest reasoning length that preserves answer correctness. We provide a recursive definition based on independently sampled sequences and prove the existence of its limit, establishing the first measurable lower bound for reasoning-chain compression. Building on an analysis of mainstream CoT compression strategies, we identify key structural factors enabling a model to approach MSL. Based on these insights, we propose TRiMS which employs the GRPO algorithm in conjunction with MSL-based estimation during training, while mitigating instabilities during the training process through dynamic batch aggregation and advantage computation using batch-level standard deviation. TRiMS achieves over 80% CoT token reduction with a minor accuracy boost across all benchmarks.

TRiMS: Real-Time Tracking of Minimal Sufficient Length for Efficient Reasoning via RL

Abstract

Large language models achieve breakthroughs in complex reasoning via long chain-of-thought sequences. However, this often leads to severe reasoning inflation, causing substantial computational redundancy. To maximize Intelligence per Token, we introduce a theoretical metric, MSL-Minimal Sufficient Length. MSL rigorously characterizes the shortest reasoning length that preserves answer correctness. We provide a recursive definition based on independently sampled sequences and prove the existence of its limit, establishing the first measurable lower bound for reasoning-chain compression. Building on an analysis of mainstream CoT compression strategies, we identify key structural factors enabling a model to approach MSL. Based on these insights, we propose TRiMS which employs the GRPO algorithm in conjunction with MSL-based estimation during training, while mitigating instabilities during the training process through dynamic batch aggregation and advantage computation using batch-level standard deviation. TRiMS achieves over 80% CoT token reduction with a minor accuracy boost across all benchmarks.
Paper Structure (39 sections, 6 equations, 12 figures, 2 tables)

This paper contains 39 sections, 6 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: (Top) Conceptual illustration of diverse reasoning paths generated by LLM. (Bottom) Expected token length of the shortest correct reasoning path(SCPT@K) as a function of the sampling times $k$.
  • Figure 2: Expected token length of the shortest correct reasoning path versus sample number $k$. (a–d) Different sampling strategies. (e) Model scaling under varying difficulty levels.
  • Figure 3: Overview of the TRiMS framework
  • Figure 4: Percentage of degenerate groups across training steps
  • Figure 5: Proportion of correct answers distribution across different length thresholds and difficulty levels.
  • ...and 7 more figures