Table of Contents
Fetching ...

Understanding Hidden Computations in Chain-of-Thought Reasoning

Aryasomayajula Ram Bharadwaj

TL;DR

This work probes how transformer models perform reasoning when explicit chain-of-thought steps are replaced by filler tokens. By applying the logit lens and token-ranking analyses to a 3SUM-based setup, the authors show that hidden reasoning steps persist across layers and can be recovered by examining lower-ranked tokens, without degrading task performance. A modified decoding scheme that bypasses fillers demonstrates practical recovery of the original reasoning, supporting the view that internal computations are preserved and overwritten only in output formatting. These results advance interpretability in language models and highlight mechanisms that balance computation and output generation, with implications for transparency and controllability of chain-of-thought reasoning.

Abstract

Chain-of-Thought (CoT) prompting has significantly enhanced the reasoning abilities of large language models. However, recent studies have shown that models can still perform complex reasoning tasks even when the CoT is replaced with filler(hidden) characters (e.g., "..."), leaving open questions about how models internally process and represent reasoning steps. In this paper, we investigate methods to decode these hidden characters in transformer models trained with filler CoT sequences. By analyzing layer-wise representations using the logit lens method and examining token rankings, we demonstrate that the hidden characters can be recovered without loss of performance. Our findings provide insights into the internal mechanisms of transformer models and open avenues for improving interpretability and transparency in language model reasoning.

Understanding Hidden Computations in Chain-of-Thought Reasoning

TL;DR

This work probes how transformer models perform reasoning when explicit chain-of-thought steps are replaced by filler tokens. By applying the logit lens and token-ranking analyses to a 3SUM-based setup, the authors show that hidden reasoning steps persist across layers and can be recovered by examining lower-ranked tokens, without degrading task performance. A modified decoding scheme that bypasses fillers demonstrates practical recovery of the original reasoning, supporting the view that internal computations are preserved and overwritten only in output formatting. These results advance interpretability in language models and highlight mechanisms that balance computation and output generation, with implications for transparency and controllability of chain-of-thought reasoning.

Abstract

Chain-of-Thought (CoT) prompting has significantly enhanced the reasoning abilities of large language models. However, recent studies have shown that models can still perform complex reasoning tasks even when the CoT is replaced with filler(hidden) characters (e.g., "..."), leaving open questions about how models internally process and represent reasoning steps. In this paper, we investigate methods to decode these hidden characters in transformer models trained with filler CoT sequences. By analyzing layer-wise representations using the logit lens method and examining token rankings, we demonstrate that the hidden characters can be recovered without loss of performance. Our findings provide insights into the internal mechanisms of transformer models and open avenues for improving interpretability and transparency in language model reasoning.

Paper Structure

This paper contains 28 sections, 6 figures.

Figures (6)

  • Figure 1: Percentage of filler tokens among top predictions across layers
  • Figure 2: Comparison of decoding methods: Our method achieves higher accuracy in recovering hidden characters compared to random token replacement
  • Figure 3: Greedy Decoding: The model outputs filler tokens followed by the final answer
  • Figure 4: Greedy Decoding with Rank-2 Tokens
  • Figure 5: Our Method: Greedy Decoding with Filler Tokens Replaced by Rank-2 Tokens (Recovering hidden characters)
  • ...and 1 more figures