Table of Contents
Fetching ...

Emergent Search and Backtracking in Latent Reasoning Models

Jasmine Cui, Charles Ye

TL;DR

Latent reasoning transformers (LRTs) perform deliberation in hidden space and reveal stepwise belief trajectories when decoded at each iteration. Using a 3.5B-parameter looped model on a 260-item four-choice QA benchmark, the study uncovers a structured latent search with phases of exploration, shallow commitment, and potential backtracking, whose dynamics adapt to task difficulty. Backtracking occurs in $32\%$ of cases, improves accuracy by $34\%$, and is directed toward discarding the most semantically similar distractor (abandoning it in $72\%$ of backtracks) before possibly selecting the correct answer ($52\%$ end with the correct option). The findings show that latent computation can mirror chain-of-thought's corrective capabilities while offering direct interpretability, with implications for understanding adaptive computation and robustness in latent reasoning systems.

Abstract

What happens when a language model thinks without words? Standard reasoning LLMs verbalize intermediate steps as chain-of-thought; latent reasoning transformers (LRTs) instead perform deliberation entirely in continuous hidden space. We investigate an LRT, decoding the model's evolving beliefs at every step on a multiple-choice QA benchmark. We find that the model spontaneously learns a structured search process in latent space. Deliberation follows a consistent trajectory: an exploration phase where probability mass spreads across candidates, tentative commitment to a frontrunner, and either convergence or backtracking. Backtracking is prevalent (32% of instances), beneficial (34% accuracy gain over non-backtracking instances), and predominantly directed away from the semantically closest distractor toward the correct answer. The search is adaptive: replacing distractors with implausible alternatives shortens exploration by 54%. Latent reasoning models achieve in activation space what chain-of-thought achieves through words: the ability to be wrong, notice, and recover.

Emergent Search and Backtracking in Latent Reasoning Models

TL;DR

Latent reasoning transformers (LRTs) perform deliberation in hidden space and reveal stepwise belief trajectories when decoded at each iteration. Using a 3.5B-parameter looped model on a 260-item four-choice QA benchmark, the study uncovers a structured latent search with phases of exploration, shallow commitment, and potential backtracking, whose dynamics adapt to task difficulty. Backtracking occurs in of cases, improves accuracy by , and is directed toward discarding the most semantically similar distractor (abandoning it in of backtracks) before possibly selecting the correct answer ( end with the correct option). The findings show that latent computation can mirror chain-of-thought's corrective capabilities while offering direct interpretability, with implications for understanding adaptive computation and robustness in latent reasoning systems.

Abstract

What happens when a language model thinks without words? Standard reasoning LLMs verbalize intermediate steps as chain-of-thought; latent reasoning transformers (LRTs) instead perform deliberation entirely in continuous hidden space. We investigate an LRT, decoding the model's evolving beliefs at every step on a multiple-choice QA benchmark. We find that the model spontaneously learns a structured search process in latent space. Deliberation follows a consistent trajectory: an exploration phase where probability mass spreads across candidates, tentative commitment to a frontrunner, and either convergence or backtracking. Backtracking is prevalent (32% of instances), beneficial (34% accuracy gain over non-backtracking instances), and predominantly directed away from the semantically closest distractor toward the correct answer. The search is adaptive: replacing distractors with implausible alternatives shortens exploration by 54%. Latent reasoning models achieve in activation space what chain-of-thought achieves through words: the ability to be wrong, notice, and recover.
Paper Structure (16 sections, 3 equations, 3 figures, 1 table)

This paper contains 16 sections, 3 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Latent thought on a single question. The model explores candidates roughly uniformly, commits to the semantically closest distractor (fish), then backtracks to the correct answer (echinoderm). This trajectory — exploration, shallow commitment, error correction — is representative of the search dynamics we characterize across the benchmark. Shaded regions = 95% CIs over 25 random answer-order permutations.
  • Figure 2: Task difficulty modulates deliberation. Belief trajectories for the same question under three answer-set variants. Base (plausible distractors): the model explores before gradually converging. Easy (unrelated distractors): convergence is fast and confident. No correct answer: probability mass remains distributed --- the model never commits. Shaded regions = 95% CIs over 25 answer-order permutations.
  • Figure 3: Entropy tracks task difficulty. Average entropy of $p_i$ across recurrence steps, by variant. Easy questions converge rapidly to low entropy. Base questions converge more slowly. No correct answer questions remain at high entropy --- the model recognizes persistent uncertainty.