Table of Contents
Fetching ...

SIM-CoT: Supervised Implicit Chain-of-Thought

Xilin Wei, Xiaoran Liu, Yuhang Zang, Xiaoyi Dong, Yuhang Cao, Jiaqi Wang, Xipeng Qiu, Dahua Lin

TL;DR

SIM-CoT addresses the instability of implicit chain-of-thought by introducing step-level supervision via an auxiliary decoder trained to map each latent step to a corresponding explicit reasoning step. This stabilization yields improved accuracy and robustness across GPT-2 and LLaMA models while preserving the efficiency of implicit CoT at inference. The approach also provides per-step interpretability by projecting latent tokens onto an explicit reasoning vocabulary. Experimental results show consistent gains over state-of-the-art implicit CoT baselines (Coconut, CODI) and even surpass explicit CoT on GPT-2, with strong generalization to out-of-domain data.

Abstract

Implicit Chain-of-Thought (CoT) methods offer a token-efficient alternative to explicit CoT reasoning in Large Language Models (LLMs), but a persistent performance gap has limited their adoption. We identify a core latent instability issue when scaling the computational budget of implicit CoT: as the number of reasoning tokens increases, training often becomes unstable and collapses. Our analysis shows that this instability arises from latent representations becoming homogeneous and losing semantic diversity, caused by insufficient step-level supervision in current implicit CoT methods. To address this, we propose SIM-CoT, a plug-and-play training module that introduces step-level supervision to stabilize and enrich the latent reasoning space. SIM-CoT employs an auxiliary decoder during training to align each implicit token with its corresponding explicit reasoning step, ensuring latent states capture distinct and meaningful information. The auxiliary decoder is removed at inference, preserving the efficiency of implicit CoT with no added overhead. It also provides interpretability by projecting each latent token onto an explicit reasoning vocabulary, enabling per-step visualization and diagnosis. SIM-CoT significantly improves both in-domain accuracy and out-of-domain stability of implicit CoT methods, boosting Coconut by +8.2\% on GPT-2 and CODI by +3.0\% on LLaMA-3.1 8B. It further surpasses the explicit CoT baseline on GPT-2 by 2.1\% with 2.3$\times$ greater token efficiency, while closing the performance gap on larger models like LLaMA-3.1 8B. Code: https://github.com/InternLM/SIM-CoT

SIM-CoT: Supervised Implicit Chain-of-Thought

TL;DR

SIM-CoT addresses the instability of implicit chain-of-thought by introducing step-level supervision via an auxiliary decoder trained to map each latent step to a corresponding explicit reasoning step. This stabilization yields improved accuracy and robustness across GPT-2 and LLaMA models while preserving the efficiency of implicit CoT at inference. The approach also provides per-step interpretability by projecting latent tokens onto an explicit reasoning vocabulary. Experimental results show consistent gains over state-of-the-art implicit CoT baselines (Coconut, CODI) and even surpass explicit CoT on GPT-2, with strong generalization to out-of-domain data.

Abstract

Implicit Chain-of-Thought (CoT) methods offer a token-efficient alternative to explicit CoT reasoning in Large Language Models (LLMs), but a persistent performance gap has limited their adoption. We identify a core latent instability issue when scaling the computational budget of implicit CoT: as the number of reasoning tokens increases, training often becomes unstable and collapses. Our analysis shows that this instability arises from latent representations becoming homogeneous and losing semantic diversity, caused by insufficient step-level supervision in current implicit CoT methods. To address this, we propose SIM-CoT, a plug-and-play training module that introduces step-level supervision to stabilize and enrich the latent reasoning space. SIM-CoT employs an auxiliary decoder during training to align each implicit token with its corresponding explicit reasoning step, ensuring latent states capture distinct and meaningful information. The auxiliary decoder is removed at inference, preserving the efficiency of implicit CoT with no added overhead. It also provides interpretability by projecting each latent token onto an explicit reasoning vocabulary, enabling per-step visualization and diagnosis. SIM-CoT significantly improves both in-domain accuracy and out-of-domain stability of implicit CoT methods, boosting Coconut by +8.2\% on GPT-2 and CODI by +3.0\% on LLaMA-3.1 8B. It further surpasses the explicit CoT baseline on GPT-2 by 2.1\% with 2.3 greater token efficiency, while closing the performance gap on larger models like LLaMA-3.1 8B. Code: https://github.com/InternLM/SIM-CoT

Paper Structure

This paper contains 42 sections, 18 equations, 7 figures, 8 tables, 2 algorithms.

Figures (7)

  • Figure 1: (a) The latent instability issue: while using more implicit tokens initially improves accuracy, training becomes unstable and sometimes collapses. (b) Information Loss: the implicit tokens of failed models (5 latent tokens) lose crucial information about operators (like $+$, $-$), which makes complex reasoning impossible. (c) Shifted Distance: the latent-to-latent distance of failed models shrinks and becomes too similar to each other, while the latent drifts away from the central vocabulary embedding space. (d) Semantic Homogenization: failed models produce similar latent representations, resulting in a narrower range of decoded tokens, mostly numbers, as opposed to the more varied content generated by a normal model.
  • Figure 2: The framework comparison between Coconut (upper left), CODI (upper right), and our SIM-CoT (bottom). Unlike Coconut and CODI, which apply coarse-grained supervision on answers or trajectories, our SIM-CoT employs a decoder to align implicit latents with step-level reasoning, enhancing performance while maintaining inference efficiency.
  • Figure 3: Ablation study on different numbers of implicit latents. The x-axis denotes the number of implicit latents and implicit tokens (joined with “-”), while the y-axis denotes accuracy. The blue line corresponds to our method SIM-CoT, and the orange line corresponds to the baseline Coconut.
  • Figure 4: SIM-CoT case study on GSM8k. The generated implicit continuous tokens are subsequently interpreted by our decoder, which visualizes the solution intermediate steps leading to the final output.
  • Figure 5: Distribution of reasoning steps in the GSM8K-Aug training dataset. Most problems involve two to four steps, with a long-tail of harder cases. For visualization, step counts with fewer than 200 problems are omitted, though all examples are used in training.
  • ...and 2 more figures