Table of Contents
Fetching ...

SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens

Yinhan He, Wendy Zheng, Yaochen Zhu, Zaiyi Zheng, Lin Su, Sriram Vasudevan, Qi Guo, Liangjie Hong, Jundong Li

TL;DR

SemCoT tackles the verbosity and inefficiency of traditional Chain-of-Thought reasoning by jointly optimizing semantic alignment between implicit reasoning and ground-truth steps and by employing a lightweight, distillation-based implicit reasoning generator. A contrastively trained sentence transformer measures and enforces alignment in embedding space, while a distillation-guided generator produces fast, semantically faithful implicit tokens that feed into the LLM. The method includes a two-stage training regime and an inference pipeline that preserves reasoning semantics while reducing per-token latency. Empirical results across five benchmarks and two open LLMs show stronger accuracy with competitive or superior efficiency compared to state-of-the-art implicit-CoT baselines, demonstrating practical gains for efficient, reliable reasoning in real-world settings.

Abstract

The verbosity of Chain-of-Thought (CoT) reasoning hinders its mass deployment in efficiency-critical applications. Recently, implicit CoT approaches have emerged, which encode reasoning steps within LLM's hidden embeddings (termed ``implicit reasoning'') rather than explicit tokens. This approach accelerates CoT by reducing the reasoning length and bypassing some LLM components. However, existing implicit CoT methods face two significant challenges: (1) they fail to preserve the semantic alignment between the implicit reasoning (when transformed to natural language) and the ground-truth reasoning, resulting in a significant CoT performance degradation, and (2) they focus on reducing the length of the implicit reasoning; however, they neglect the considerable time cost for an LLM to generate one individual implicit reasoning token. To tackle these challenges, we propose a novel semantically-aligned implicit CoT framework termed SemCoT. In particular, for the first challenge, we design a contrastively trained sentence transformer that evaluates semantic alignment between implicit and explicit reasoning, which is used to enforce semantic preservation during implicit reasoning optimization. To address the second challenge, we introduce an efficient implicit reasoning generator by finetuning a lightweight language model using knowledge distillation. This generator is guided by our sentence transformer to distill ground-truth reasoning into semantically aligned implicit reasoning, while also optimizing for accuracy. SemCoT is the first approach that enhances CoT efficiency by jointly optimizing token-level generation speed and preserving semantic alignment with ground-truth reasoning. Extensive experiments demonstrate the superior performance of SemCoT compared to state-of-the-art methods in both efficiency and effectiveness. Our code can be found at https://github.com/YinhanHe123/SemCoT/.

SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens

TL;DR

SemCoT tackles the verbosity and inefficiency of traditional Chain-of-Thought reasoning by jointly optimizing semantic alignment between implicit reasoning and ground-truth steps and by employing a lightweight, distillation-based implicit reasoning generator. A contrastively trained sentence transformer measures and enforces alignment in embedding space, while a distillation-guided generator produces fast, semantically faithful implicit tokens that feed into the LLM. The method includes a two-stage training regime and an inference pipeline that preserves reasoning semantics while reducing per-token latency. Empirical results across five benchmarks and two open LLMs show stronger accuracy with competitive or superior efficiency compared to state-of-the-art implicit-CoT baselines, demonstrating practical gains for efficient, reliable reasoning in real-world settings.

Abstract

The verbosity of Chain-of-Thought (CoT) reasoning hinders its mass deployment in efficiency-critical applications. Recently, implicit CoT approaches have emerged, which encode reasoning steps within LLM's hidden embeddings (termed ``implicit reasoning'') rather than explicit tokens. This approach accelerates CoT by reducing the reasoning length and bypassing some LLM components. However, existing implicit CoT methods face two significant challenges: (1) they fail to preserve the semantic alignment between the implicit reasoning (when transformed to natural language) and the ground-truth reasoning, resulting in a significant CoT performance degradation, and (2) they focus on reducing the length of the implicit reasoning; however, they neglect the considerable time cost for an LLM to generate one individual implicit reasoning token. To tackle these challenges, we propose a novel semantically-aligned implicit CoT framework termed SemCoT. In particular, for the first challenge, we design a contrastively trained sentence transformer that evaluates semantic alignment between implicit and explicit reasoning, which is used to enforce semantic preservation during implicit reasoning optimization. To address the second challenge, we introduce an efficient implicit reasoning generator by finetuning a lightweight language model using knowledge distillation. This generator is guided by our sentence transformer to distill ground-truth reasoning into semantically aligned implicit reasoning, while also optimizing for accuracy. SemCoT is the first approach that enhances CoT efficiency by jointly optimizing token-level generation speed and preserving semantic alignment with ground-truth reasoning. Extensive experiments demonstrate the superior performance of SemCoT compared to state-of-the-art methods in both efficiency and effectiveness. Our code can be found at https://github.com/YinhanHe123/SemCoT/.

Paper Structure

This paper contains 23 sections, 3 equations, 17 figures, 4 tables.

Figures (17)

  • Figure 1: Illustration of how implicit CoT approaches improve CoT efficiency. "Ans." is the answer. Curl arrows represent that the tokens are autoregressively generated. $r_i$s are explicit reasoning tokens.
  • Figure 2: Overview of the proposed SemCoT. Each cyan box is a hidden text embedding within model components, with the text content and model type varying based on the box's position in the figure. Fire and snowflake signs mean the component is trained and frozen, respectively.
  • Figure 3: Ablation study of SemCoT.
  • Figure 4: Parameter sensitivity of SemCoT.
  • Figure 5: Case study comparing SemCoT vs COCONUT. PCA plots of implicit reasoning embeddings for 3 SVAMP queries with 20 semantic variants each. SemCoT (blue) has tighter clustering than COCONUT (orange), showing its ability to generate semantic aligned reasoning.
  • ...and 12 more figures