Table of Contents
Fetching ...

A Survey on Latent Reasoning

Rui-Jie Zhu, Tianhao Peng, Tianhao Cheng, Xingwei Qu, Jinfa Huang, Dawei Zhu, Hao Wang, Kaiwen Xue, Xuanliang Zhang, Yong Shan, Tianle Cai, Taylor Kergan, Assel Kembay, Andrew Smith, Chenghua Lin, Binh Nguyen, Yuqi Pan, Yuhong Chou, Zefan Cai, Zhenhe Wu, Yongchi Zhao, Tianyu Liu, Jian Yang, Wangchunshu Zhou, Chujie Zheng, Chongxuan Li, Yuyin Zhou, Zhoujun Li, Zhaoxiang Zhang, Jiaheng Liu, Ge Zhang, Wenhao Huang, Jason Eshraghian

TL;DR

Latent CoT reframes reasoning by shifting from discrete token-level intermediate steps to continuous hidden states, potentially expanding expressive capacity beyond token-based CoT. The survey develops a unified framework classifying latent CoT into vertical activation-based recurrence and horizontal hidden-state recurrence, and reviews diffusion-based and training-induced variants. It also covers mechanistic interpretability, theoretical notions of layer-based reasoning, and frontier ideas for infinite-depth reasoning via diffusion and optimization-based frameworks. The work highlights the need for standardized benchmarks and integration of architectural and training-based approaches to realize scalable, interpretable latent reasoning in future LLMs.

Abstract

Large Language Models (LLMs) have demonstrated impressive reasoning capabilities, especially when guided by explicit chain-of-thought (CoT) reasoning that verbalizes intermediate steps. While CoT improves both interpretability and accuracy, its dependence on natural language reasoning limits the model's expressive bandwidth. Latent reasoning tackles this bottleneck by performing multi-step inference entirely in the model's continuous hidden state, eliminating token-level supervision. To advance latent reasoning research, this survey provides a comprehensive overview of the emerging field of latent reasoning. We begin by examining the foundational role of neural network layers as the computational substrate for reasoning, highlighting how hierarchical representations support complex transformations. Next, we explore diverse latent reasoning methodologies, including activation-based recurrence, hidden state propagation, and fine-tuning strategies that compress or internalize explicit reasoning traces. Finally, we discuss advanced paradigms such as infinite-depth latent reasoning via masked diffusion models, which enable globally consistent and reversible reasoning processes. By unifying these perspectives, we aim to clarify the conceptual landscape of latent reasoning and chart future directions for research at the frontier of LLM cognition. An associated GitHub repository collecting the latest papers and repos is available at: https://github.com/multimodal-art-projection/LatentCoT-Horizon/.

A Survey on Latent Reasoning

TL;DR

Latent CoT reframes reasoning by shifting from discrete token-level intermediate steps to continuous hidden states, potentially expanding expressive capacity beyond token-based CoT. The survey develops a unified framework classifying latent CoT into vertical activation-based recurrence and horizontal hidden-state recurrence, and reviews diffusion-based and training-induced variants. It also covers mechanistic interpretability, theoretical notions of layer-based reasoning, and frontier ideas for infinite-depth reasoning via diffusion and optimization-based frameworks. The work highlights the need for standardized benchmarks and integration of architectural and training-based approaches to realize scalable, interpretable latent reasoning in future LLMs.

Abstract

Large Language Models (LLMs) have demonstrated impressive reasoning capabilities, especially when guided by explicit chain-of-thought (CoT) reasoning that verbalizes intermediate steps. While CoT improves both interpretability and accuracy, its dependence on natural language reasoning limits the model's expressive bandwidth. Latent reasoning tackles this bottleneck by performing multi-step inference entirely in the model's continuous hidden state, eliminating token-level supervision. To advance latent reasoning research, this survey provides a comprehensive overview of the emerging field of latent reasoning. We begin by examining the foundational role of neural network layers as the computational substrate for reasoning, highlighting how hierarchical representations support complex transformations. Next, we explore diverse latent reasoning methodologies, including activation-based recurrence, hidden state propagation, and fine-tuning strategies that compress or internalize explicit reasoning traces. Finally, we discuss advanced paradigms such as infinite-depth latent reasoning via masked diffusion models, which enable globally consistent and reversible reasoning processes. By unifying these perspectives, we aim to clarify the conceptual landscape of latent reasoning and chart future directions for research at the frontier of LLM cognition. An associated GitHub repository collecting the latest papers and repos is available at: https://github.com/multimodal-art-projection/LatentCoT-Horizon/.

Paper Structure

This paper contains 56 sections, 12 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Explicit reasoning transmits discrete tokens ($\approx 15\,\text{bits}$ each), whereas latent reasoning exchanges full 2560-dimensional FP16 hidden states ($\approx 40,960\,\text{bits}$ each), revealing a $\sim\!2.7\times10^{3}$-fold bandwidth gap between the two approaches.
  • Figure 2: Taxonomy of Latent Reasoning.
  • Figure 3: Comparison of Activation-Based and Hidden-state-Based Latent Reasoning. Activation-based methods (left) iteratively refine representations by looping through the same layers over multiple time steps ($T=1, 2,..., N$), increasing computational depth. Hidden-state-based methods (right) process information sequentially, evolving a hidden state that carries information across a potentially long temporal sequence ($T=1, 2,..., N$).
  • Figure 4: Conceptual diagram of a Pre/Loop/Coda architecture with per-iteration input $x_t$, hidden state $S_t$ (KV-cache), depth embedding $d_t$, and a dynamic-stop gate.
  • Figure 5: An evolutionary graph of the text diffusion models, including three architectural families: Masked Diffusion Models, Embedding-based Diffusion Models, and Hybrid AR-Diffusion Models.