Emergent Search and Backtracking in Latent Reasoning Models
Jasmine Cui, Charles Ye
TL;DR
Latent reasoning transformers (LRTs) perform deliberation in hidden space and reveal stepwise belief trajectories when decoded at each iteration. Using a 3.5B-parameter looped model on a 260-item four-choice QA benchmark, the study uncovers a structured latent search with phases of exploration, shallow commitment, and potential backtracking, whose dynamics adapt to task difficulty. Backtracking occurs in $32\%$ of cases, improves accuracy by $34\%$, and is directed toward discarding the most semantically similar distractor (abandoning it in $72\%$ of backtracks) before possibly selecting the correct answer ($52\%$ end with the correct option). The findings show that latent computation can mirror chain-of-thought's corrective capabilities while offering direct interpretability, with implications for understanding adaptive computation and robustness in latent reasoning systems.
Abstract
What happens when a language model thinks without words? Standard reasoning LLMs verbalize intermediate steps as chain-of-thought; latent reasoning transformers (LRTs) instead perform deliberation entirely in continuous hidden space. We investigate an LRT, decoding the model's evolving beliefs at every step on a multiple-choice QA benchmark. We find that the model spontaneously learns a structured search process in latent space. Deliberation follows a consistent trajectory: an exploration phase where probability mass spreads across candidates, tentative commitment to a frontrunner, and either convergence or backtracking. Backtracking is prevalent (32% of instances), beneficial (34% accuracy gain over non-backtracking instances), and predominantly directed away from the semantically closest distractor toward the correct answer. The search is adaptive: replacing distractors with implausible alternatives shortens exploration by 54%. Latent reasoning models achieve in activation space what chain-of-thought achieves through words: the ability to be wrong, notice, and recover.
