A Formal Comparison Between Chain-of-Thought and Latent Thought
Kevin Xu, Issei Sato
TL;DR
The paper formalizes and contrasts two Transformer-based reasoning paradigms: Chain-of-Thought (CoT), which generates explicit intermediate steps token-by-token, and Latent Thought in Looped Transformers, which iteratively refines latent representations without decoding. It proves a polylogarithmic-time separation in the deterministic setting, showing Latent Thought can evaluate computation graphs in depth by parallelizing across layers, whereas CoT requires steps proportional to graph size; this links Looped TF to circuit classes like AC^k and TC^k. In the stochastic setting, the authors show CoT with stochastic decoding can realize randomized approximation schemes (FPAUS/FPRAS) for self-reducible problems, while Looped TF cannot under standard complexity separations (FPTAS ⊊ FPRAS). Experiments on parallelizable tasks and a DNF-counting approximation task corroborate the theory, illustrating practical trade-offs: Latent Thought excels in parallel computation with modest loop counts, while CoT leverages stochastic sampling to tackle hard combinatorial problems. The work provides a principled guide for selecting reasoning paradigms depending on whether the problem favors parallel latent-space computation or probabilistic approximation via reasoning steps, with implications for designing future hierarchical or hardware-accelerated reasoning systems.
Abstract
Chain-of-Thought (CoT) elicits reasoning in large language models by explicitly generating intermediate steps in natural language. In contrast, Latent Thought in looped models operates directly in the continuous latent space, enabling computation beyond discrete linguistic representations. While both approaches exploit iterative computation, their comparative capabilities remain underexplored. In this work, we present a formal analysis showing that Latent Thought in Looped Transformers enables parallel computation, which is more efficient than the inherently sequential process of CoT. In contrast, CoT leverages stochastic decoding to approximate solutions to problems where exact computation is intractable. These separations suggest the tasks for which depth-driven recursion is more suitable, thereby offering practical guidance for choosing between reasoning paradigms. Code is available at https://github.com/kevin671/cot-vs-loop.
