Table of Contents
Fetching ...

AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models

Feng Luo, Yu-Neng Chuang, Guanchu Wang, Hoang Anh Duy Le, Shaochen Zhong, Hongyi Liu, Jiayi Yuan, Yang Sui, Vladimir Braverman, Vipin Chaudhary, Xia Hu

TL;DR

AutoL2S tackles the inefficiency of reasoning-capable LLMs by introducing a dynamic long-short reasoning framework that lets models decide when to generate long or short CoT via an <EASY> token. It builds a dual-path CoT training dataset with explicit long and short traces and trains non-reasoning LLMs to distill reasoning capabilities while compressing the output. Inference automatically selects the appropriate reasoning length, yielding up to 57% shorter CoT paths without sacrificing accuracy across multiple benchmarks. The approach demonstrates robust improvements in efficiency and provides a practical path toward scalable, cost-effective reasoning with LLMs.

Abstract

The reasoning-capable large language models (LLMs) demonstrate strong performance on complex reasoning tasks but often suffer from overthinking, generating unnecessarily long chain-of-thought (CoT) reasoning paths for easy reasoning questions, thereby increasing inference cost and latency. Recent approaches attempt to address this challenge by manually deciding when to apply long or short reasoning. However, they lack the flexibility to adapt CoT length dynamically based on question complexity. In this paper, we propose Auto Long-Short Reasoning (AutoL2S), a dynamic and model-agnostic framework that enables LLMs to dynamically compress their generated reasoning path based on the complexity of the reasoning question. AutoL2S enables a learned paradigm, in which LLMs themselves can decide when longer reasoning is necessary and when shorter reasoning suffices, by training on data annotated with our proposed method, which includes both long and short CoT paths and a special <EASY> token. We then use <EASY> token to indicate when the model can skip generating lengthy CoT reasoning. This proposed annotation strategy can enhance the LLMs' ability to generate shorter CoT reasoning paths with improved quality after training. Extensive evaluation results show that AutoL2S reduces the length of reasoning generation by up to 57% without compromising performance, demonstrating the effectiveness of AutoL2S for scalable and efficient LLM reasoning.

AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models

TL;DR

AutoL2S tackles the inefficiency of reasoning-capable LLMs by introducing a dynamic long-short reasoning framework that lets models decide when to generate long or short CoT via an <EASY> token. It builds a dual-path CoT training dataset with explicit long and short traces and trains non-reasoning LLMs to distill reasoning capabilities while compressing the output. Inference automatically selects the appropriate reasoning length, yielding up to 57% shorter CoT paths without sacrificing accuracy across multiple benchmarks. The approach demonstrates robust improvements in efficiency and provides a practical path toward scalable, cost-effective reasoning with LLMs.

Abstract

The reasoning-capable large language models (LLMs) demonstrate strong performance on complex reasoning tasks but often suffer from overthinking, generating unnecessarily long chain-of-thought (CoT) reasoning paths for easy reasoning questions, thereby increasing inference cost and latency. Recent approaches attempt to address this challenge by manually deciding when to apply long or short reasoning. However, they lack the flexibility to adapt CoT length dynamically based on question complexity. In this paper, we propose Auto Long-Short Reasoning (AutoL2S), a dynamic and model-agnostic framework that enables LLMs to dynamically compress their generated reasoning path based on the complexity of the reasoning question. AutoL2S enables a learned paradigm, in which LLMs themselves can decide when longer reasoning is necessary and when shorter reasoning suffices, by training on data annotated with our proposed method, which includes both long and short CoT paths and a special <EASY> token. We then use <EASY> token to indicate when the model can skip generating lengthy CoT reasoning. This proposed annotation strategy can enhance the LLMs' ability to generate shorter CoT reasoning paths with improved quality after training. Extensive evaluation results show that AutoL2S reduces the length of reasoning generation by up to 57% without compromising performance, demonstrating the effectiveness of AutoL2S for scalable and efficient LLM reasoning.

Paper Structure

This paper contains 41 sections, 13 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: AutoL2S versus baseline methods in accuracy and reasoning length.
  • Figure 2: During the inference process, LLMs generate (a) a long reasoning path in the case without <EASY> token; and generate (b) a short reasoning path in the case with <EASY> token. Note that the generation of either long or short CoT reasoning paths is automatically determined by the model without any human intervention.
  • Figure 3: Comparison of attention maps at early and late training steps of AutoL2S. Step 1551 corresponds to the final training step. Given the long sequence lengths, we group every 20 tokens together to calculate attention scores between long and short reasoning paths for better visualization.