TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression
Zhong-Zhi Li, Xiao Liang, Zihao Tang, Lei Ji, Peijie Wang, Haotian Xu, Xing W, Haizhen Huang, Weiwei Deng, Yeyun Gong, Zhijiang Guo, Xiao Liu, Fei Yin, Cheng-Lin Liu
TL;DR
TlDr introduces a dynamic thinking-length reweighting framework to compress LLM reasoning by adaptively balancing short CoT (System-1) and long CoT (System-2) data during post-training. By estimating upper bounds on efficiency and accuracy and updating data ratios in real time, TlDr achieves around 40% token reduction on DeepSeek-R1-Distill-7B/14B with little degradation in reasoning performance. The method outperforms static data mixtures and token-budgeted baselines, while requiring simpler data construction and no extensive problem-by-problem annotations. This approach offers a practical path toward efficient, scalable reasoning in large language models for diverse benchmarks and problem difficulties.
Abstract
Large Language Models (LLMs) have recently achieved remarkable progress by leveraging Reinforcement Learning and extended Chain-of-Thought (CoT) techniques. However, the challenge of performing efficient language reasoning--especially during inference with extremely long outputs--has drawn increasing attention from the research community. In this work, we propose a dynamic ratio-based training pipeline that does not rely on sophisticated data annotations or interpolation between multiple models. We continuously balance the weights between the model's System-1 and System-2 data to eliminate redundant reasoning processes while preserving the model's reasoning capability. We validate our approach across models on DeepSeek-R1-Distill-7B and DeepSeek-R1-Distill-14B and on a diverse set of benchmarks with varying difficulty levels. Our method significantly reduces the number of output tokens by nearly 40% while maintaining the accuracy of the reasoning. Our code and data will be available soon.
