Table of Contents
Fetching ...

Dynamics Within Latent Chain-of-Thought: An Empirical Study of Causal Structure

Zirui Li, Xuefeng Bai, Kehai Chen, Yizhi Li, Jian Yang, Chenghua Lin, Min Zhang

TL;DR

This work reframes latent chain-of-thought as a stepwise causal process in representation space and introduces an intervention-based framework to quantify how intermediate latent states influence final predictions. By applying do-interventions and readouts to two latent-reasoning paradigms, Coconut and CODI, across mathematical and general reasoning tasks, it reveals heterogeneous stepwise leverage, non-local information flow, and a persistent gap between early output bias and late representational commitment. The findings motivate designing training and decoding objectives that shape latent routing and bottlenecks rather than simply increasing latent depth, with implications for more stable and faithful latent reasoning systems. Overall, the study provides a principled, causal, and actionable perspective on interpreting and improving latent CoT dynamics in large language models.

Abstract

Latent or continuous chain-of-thought methods replace explicit textual rationales with a number of internal latent steps, but these intermediate computations are difficult to evaluate beyond correlation-based probes. In this paper, we view latent chain-of-thought as a manipulable causal process in representation space by modeling latent steps as variables in a structural causal model (SCM) and analyzing their effects through step-wise $\mathrm{do}$-interventions. We study two representative paradigms (i.e., Coconut and CODI) on both mathematical and general reasoning tasks to investigate three key questions: (1) which steps are causally necessary for correctness and when answers become decidable early; (2) how does influence propagate across steps, and how does this structure compare to explicit CoT; and (3) do intermediate trajectories retain competing answer modes, and how does output-level commitment differ from representational commitment across steps. We find that latent-step budgets behave less like homogeneous extra depth and more like staged functionality with non-local routing, and we identify a persistent gap between early output bias and late representational commitment. These results motivate mode-conditional and stability-aware analyses -- and corresponding training/decoding objectives -- as more reliable tools for interpreting and improving latent reasoning systems.

Dynamics Within Latent Chain-of-Thought: An Empirical Study of Causal Structure

TL;DR

This work reframes latent chain-of-thought as a stepwise causal process in representation space and introduces an intervention-based framework to quantify how intermediate latent states influence final predictions. By applying do-interventions and readouts to two latent-reasoning paradigms, Coconut and CODI, across mathematical and general reasoning tasks, it reveals heterogeneous stepwise leverage, non-local information flow, and a persistent gap between early output bias and late representational commitment. The findings motivate designing training and decoding objectives that shape latent routing and bottlenecks rather than simply increasing latent depth, with implications for more stable and faithful latent reasoning systems. Overall, the study provides a principled, causal, and actionable perspective on interpreting and improving latent CoT dynamics in large language models.

Abstract

Latent or continuous chain-of-thought methods replace explicit textual rationales with a number of internal latent steps, but these intermediate computations are difficult to evaluate beyond correlation-based probes. In this paper, we view latent chain-of-thought as a manipulable causal process in representation space by modeling latent steps as variables in a structural causal model (SCM) and analyzing their effects through step-wise -interventions. We study two representative paradigms (i.e., Coconut and CODI) on both mathematical and general reasoning tasks to investigate three key questions: (1) which steps are causally necessary for correctness and when answers become decidable early; (2) how does influence propagate across steps, and how does this structure compare to explicit CoT; and (3) do intermediate trajectories retain competing answer modes, and how does output-level commitment differ from representational commitment across steps. We find that latent-step budgets behave less like homogeneous extra depth and more like staged functionality with non-local routing, and we identify a persistent gap between early output bias and late representational commitment. These results motivate mode-conditional and stability-aware analyses -- and corresponding training/decoding objectives -- as more reliable tools for interpreting and improving latent reasoning systems.
Paper Structure (47 sections, 16 equations, 17 figures, 1 table)

This paper contains 47 sections, 16 equations, 17 figures, 1 table.

Figures (17)

  • Figure 1: Overview of step-centric research questions for latent CoT. RQ1 tests step necessity and early decodability; RQ2 characterizes step-to-step influence propagation; RQ3 probes trajectory-level superposition and commitment across rollouts.
  • Figure 2: Intervention-based protocol for latent CoT as a causal system.(I) Variables. Input $X$ induces a latent trajectory $\{h_t\}_{t=1}^{T}$ and output $Y$; an intervention operator implements step-wise $\mathrm{do}(h_t \leftarrow \tilde{h}_t)$; a readout maps intermediate states to answer support. (II) Standard propagation. Unperturbed dynamics from $X$ through steps to $Y$. (III) Step-wise intervention. We surgically replace a single step state while keeping downstream computation intact, yielding an intervened outcome $\tilde{Y}$ (RQ1). (IV) Early-stop decoding. We truncate latent computation after step $k$ and decode from $h_k$ to test when correctness becomes decodable (RQ1). (V) Influence estimation. Combining a step-$t$ intervention with an early readout at step $s$ yields directed propagation strengths $W_{t,s}$ summarized as an empirical influence structure (RQ2). (VI) Step-wise readouts. We read out answer competition from $h_t$ (e.g., teacher forcing or a fixed probe) to characterize superposition and commitment (RQ3).
  • Figure 3: Step-wise necessity measured by decision instability. We intervene at a single latent step $t\in\{1,\dots,6\}$ by zeroing its state, $\mathrm{do}(h_t:=\mathbf{0})$, and then decode the final answer. We report the flip rate $\mathrm{Flip}(t)$, i.e., the fraction of examples whose decoded prediction changes relative to the baseline, on CommonsenseQA (left) and GSM8K (right). Error bars indicate estimation uncertainty.
  • Figure 4: Early-stop decoding reveals when correctness becomes decodable. We report the cumulative solved fraction $S(k)=\mathbb{P}(k_i\le k)$ (Equation. \ref{['def:earlystop_summaries']}) under early-stop decoding on CommonsenseQA (left) and GSM8K (right), where $k_i$ is the earliest step at which the correct answer becomes decodable (Equation. \ref{['def:earliest_correct']}).
  • Figure 5: Explicit CoT principal influence graphs (GSM8K; CoT-SFT baselines). Nodes denote the first $T{=}6$ segmented CoT steps. Edge $t\!\to\!s$ indicates propagation strength $W_{t,s}$ from Eq. \ref{['eq:rq2_W']} (teacher-forced KL shift on the gold answer under a single-step intervention at $t$ and readout at $s$). For readability we show only top-1 outgoing edges after thresholding at $\alpha{=}0.1\cdot\max(W)$.
  • ...and 12 more figures