Table of Contents
Fetching ...

Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models

Zekai Zhao, Qi Liu, Kun Zhou, Zihan Liu, Yifei Shao, Zhiting Hu, Biwei Huang

TL;DR

This work tackles eliciting long-chain-of-thought (long-CoT) reasoning in large language models without costly training. It identifies a small set of high-impact activations in the last few layers that govern long-CoT traits, and shows that amplifying these activations together with inserting a wait token can wake long-CoT behavior in inference without any training. The authors characterize the activation dynamics, revealing a sharp rise after trigger tokens followed by a decay, and they fit simple analytic functions to reproduce these trajectories for inference-time control. They propose two practical methods: a training-free activation-control approach (EELo-CoT) and a parameter-efficient fine-tuning scheme, both improving reasoning performance and self-reflection across math and science benchmarks while drastically reducing training parameters. The results demonstrate robust gains across multiple models and datasets, suggesting scalable, training-light ways to enhance deep reasoning in LLMs.

Abstract

Despite the remarkable reasoning performance, eliciting the long chain-of-thought (CoT) ability in large language models (LLMs) typically requires costly reinforcement learning or supervised fine-tuning on high-quality distilled data. We investigate the internal mechanisms behind this capability and show that a small set of high-impact activations in the last few layers largely governs long-form reasoning attributes, such as output length and self-reflection. By simply amplifying these activations and inserting "wait" tokens, we can invoke the long CoT ability without any training, resulting in significantly increased self-reflection rates and accuracy. Moreover, we find that the activation dynamics follow predictable trajectories, with a sharp rise after special tokens and a subsequent exponential decay. Building on these insights, we introduce a general training-free activation control technique. It leverages a few contrastive examples to identify key activations, and employs simple analytic functions to modulate their values at inference time to elicit long CoTs. Extensive experiments confirm the effectiveness of our method in efficiently eliciting long CoT reasoning in LLMs and improving their performance. Additionally, we propose a parameter-efficient fine-tuning method that trains only a last-layer activation amplification module and a few LoRA layers, outperforming full LoRA fine-tuning on reasoning benchmarks with significantly fewer parameters. Our code and data are publicly released.

Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models

TL;DR

This work tackles eliciting long-chain-of-thought (long-CoT) reasoning in large language models without costly training. It identifies a small set of high-impact activations in the last few layers that govern long-CoT traits, and shows that amplifying these activations together with inserting a wait token can wake long-CoT behavior in inference without any training. The authors characterize the activation dynamics, revealing a sharp rise after trigger tokens followed by a decay, and they fit simple analytic functions to reproduce these trajectories for inference-time control. They propose two practical methods: a training-free activation-control approach (EELo-CoT) and a parameter-efficient fine-tuning scheme, both improving reasoning performance and self-reflection across math and science benchmarks while drastically reducing training parameters. The results demonstrate robust gains across multiple models and datasets, suggesting scalable, training-light ways to enhance deep reasoning in LLMs.

Abstract

Despite the remarkable reasoning performance, eliciting the long chain-of-thought (CoT) ability in large language models (LLMs) typically requires costly reinforcement learning or supervised fine-tuning on high-quality distilled data. We investigate the internal mechanisms behind this capability and show that a small set of high-impact activations in the last few layers largely governs long-form reasoning attributes, such as output length and self-reflection. By simply amplifying these activations and inserting "wait" tokens, we can invoke the long CoT ability without any training, resulting in significantly increased self-reflection rates and accuracy. Moreover, we find that the activation dynamics follow predictable trajectories, with a sharp rise after special tokens and a subsequent exponential decay. Building on these insights, we introduce a general training-free activation control technique. It leverages a few contrastive examples to identify key activations, and employs simple analytic functions to modulate their values at inference time to elicit long CoTs. Extensive experiments confirm the effectiveness of our method in efficiently eliciting long CoT reasoning in LLMs and improving their performance. Additionally, we propose a parameter-efficient fine-tuning method that trains only a last-layer activation amplification module and a few LoRA layers, outperforming full LoRA fine-tuning on reasoning benchmarks with significantly fewer parameters. Our code and data are publicly released.

Paper Structure

This paper contains 42 sections, 4 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: (a) Sparse Activations when processing Long CoT
  • Figure 2: (b) Model Accuracy and Amplification Scale
  • Figure 3: (c) Self-Reflection Ratio and Amplification Scale
  • Figure 4: (a) Wait Token Insert Induces Long-CoT Reasoning
  • Figure 5: (b) Activation Patterns of base and long CoT LLMs
  • ...and 5 more figures