Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning
Zhuoyuan Hao, Zhuo Li, Wu Li, Fangming Liu, Min Zhang, Jing Li
TL;DR
The paper identifies the Echo of Prompt (EOP) as a spontaneous front-loaded repetition in large reasoning models and formalizes its cost via a rejection-sampling framework, introducing the Echo Likelihood Gap $Δ\mathcal{L}$ as a proxy for the echo’s trade-off with downstream accuracy. It demonstrates that EOP acts as an attention refocusing mechanism, with increased within-trace attention to the answer-prefix in mid layers correlating with correctness. To harness this phenomenon, the authors propose Echo-Distilled SFT (ED-SFT), which trains models to adopt an echo-then-reason pattern, and Echoic Prompting (EP), a training-free method that re-ground the model on the original prompt during inference. Across GSM8K, MathQA, Hendrycks-MATH, AIME24, and MATH-500, both ED-SFT and EP yield consistent performance gains and show robust generalization across architectures, supporting the view of EOP as a beneficial cognitive primitive rather than a mere flaw. The work offers mechanistic explanations, including mid-layer attention dynamics and information-flow pathways, and provides practical guidance for cultivating robust self-aligned reasoning in LRMs.
Abstract
Test-time compute allocation in large reasoning models (LRMs) is widely used and has applications in mathematical problem solving, code synthesis, and planning. Recent work has addressed this problem by scaling self-consistency and parallel thinking, adding generic ``thinking tokens'' and prompting models to re-read the question before answering. Unfortunately, these approaches either inject task-agnostic tokens or mandate heuristics that do not explain -- and often ignore -- the \emph{spontaneous} repetition that many LRMs exhibit at the head of their internal chains. In contrast, we analyze and harness the model's tendency to restate the question, which we term the \emph{Echo of Prompt (EOP)}, as a front-loaded, compute-shaping mechanism. We formalize its probabilistic cost by casting echo removal as rejection-based conditioning and defining the \emph{Echo Likelihood Gap} $Δ\mathcal{L}$ as a computable proxy. This provides the missing theoretical link that links early repetition to likelihood gains and downstream accuracy. However, it does not by itself specify how to exploit EOP. Consequently, we develop \emph{Echo-Distilled SFT (ED-SFT)} to instill an ``echo-then-reason'' pattern through supervised finetuning, and \emph{Echoic Prompting (EP)} to re-ground the model mid-trace without training. While promising, quantifying benefits beyond verbosity is non-trivial. Therefore, we conduct length and suffix-controlled likelihood analyses together with layer-wise attention studies, showing that EOP increases answer to answer-prefix attention in middle layers, consistent with an \emph{attention refocusing} mechanism. We evaluate on GSM8K, MathQA, Hendrycks-MATH, AIME24, and MATH-500 under identical decoding settings and budgets, and find consistent gains over baselines. Code is available at https://github.com/hhh2210/echoes-as-anchors.
