Table of Contents
Fetching ...

SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs

Yige Xu, Xu Guo, Zhiwei Zeng, Chunyan Miao

TL;DR

SoftCoT introduces a lightweight, continuous-space chain-of-thought approach that avoids catastrophic forgetting by freezing the backbone LLM and training a projection to map soft thoughts from a smaller assistant model into the LLM’s space. The method uses instance-specific soft thought tokens generated by an auxiliary model, a trainable projection layer, and fixed task instructions to perform reasoning, enabling efficient and effective multi-step reasoning. Across five reasoning benchmarks and multiple backbones, SoftCoT consistently improves accuracy and generalizes to unseen domains while remaining orthogonal to self-consistency techniques. The work highlights the potential of continuous latent reasoning to enhance interpretability and efficiency without extensive fine-tuning, while acknowledging limitations in fully replacing the actual reasoning path and the need for broader scalability studies.

Abstract

Chain-of-Thought (CoT) reasoning enables Large Language Models (LLMs) to solve complex reasoning tasks by generating intermediate reasoning steps. However, most existing approaches focus on hard token decoding, which constrains reasoning within the discrete vocabulary space and may not always be optimal. While recent efforts explore continuous-space reasoning, they often require full-model fine-tuning and suffer from catastrophic forgetting, limiting their applicability to state-of-the-art LLMs that already perform well in zero-shot settings with a proper instruction. To address this challenge, we propose a novel approach for continuous-space reasoning that does not require modifying the LLM. Specifically, we employ a lightweight fixed assistant model to speculatively generate instance-specific soft thought tokens as the initial chain of thoughts, which are then mapped into the LLM's representation space via a trainable projection module. Experimental results on five reasoning benchmarks demonstrate that our method enhances LLM reasoning performance through supervised, parameter-efficient fine-tuning. Source code is available at https://github.com/xuyige/SoftCoT.

SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs

TL;DR

SoftCoT introduces a lightweight, continuous-space chain-of-thought approach that avoids catastrophic forgetting by freezing the backbone LLM and training a projection to map soft thoughts from a smaller assistant model into the LLM’s space. The method uses instance-specific soft thought tokens generated by an auxiliary model, a trainable projection layer, and fixed task instructions to perform reasoning, enabling efficient and effective multi-step reasoning. Across five reasoning benchmarks and multiple backbones, SoftCoT consistently improves accuracy and generalizes to unseen domains while remaining orthogonal to self-consistency techniques. The work highlights the potential of continuous latent reasoning to enhance interpretability and efficiency without extensive fine-tuning, while acknowledging limitations in fully replacing the actual reasoning path and the need for broader scalability studies.

Abstract

Chain-of-Thought (CoT) reasoning enables Large Language Models (LLMs) to solve complex reasoning tasks by generating intermediate reasoning steps. However, most existing approaches focus on hard token decoding, which constrains reasoning within the discrete vocabulary space and may not always be optimal. While recent efforts explore continuous-space reasoning, they often require full-model fine-tuning and suffer from catastrophic forgetting, limiting their applicability to state-of-the-art LLMs that already perform well in zero-shot settings with a proper instruction. To address this challenge, we propose a novel approach for continuous-space reasoning that does not require modifying the LLM. Specifically, we employ a lightweight fixed assistant model to speculatively generate instance-specific soft thought tokens as the initial chain of thoughts, which are then mapped into the LLM's representation space via a trainable projection module. Experimental results on five reasoning benchmarks demonstrate that our method enhances LLM reasoning performance through supervised, parameter-efficient fine-tuning. Source code is available at https://github.com/xuyige/SoftCoT.
Paper Structure (32 sections, 8 equations, 2 figures, 7 tables)

This paper contains 32 sections, 8 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: A comparison of SoftCoT, vanilla Chain-of-Thought, and Coconut.
  • Figure 2: The impact of thought token numbers in ASDiv-Aug using LLaMA-3.1-8B-Instruct.