Table of Contents
Fetching ...

Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries

Eden Biran, Daniela Gottesman, Sohee Yang, Mor Geva, Amir Globerson

TL;DR

We explore latent multi-hop reasoning in large language models by constructing a Wikidata-based dataset of two-hop queries and applying Patchscopes to localize where first-hop resolution occurs. The findings reveal a sequential computation: the first hop is resolved in early layers into the bridge entity, which then enables the second hop to be resolved in later layers with information propagating to the final token. To address failures, the authors propose back-patching, injecting later-layer representations into earlier layers, achieving 32–66% restoration of previously incorrect cases while preserving correct answers. The work provides a dataset, analysis tools, and a practical technique to diagnose and improve latent reasoning in transformers, with implications for understanding internal computation and guiding future enhancements. Overall, the study advances mechanistic insight into how multi-hop queries are handled in large pretrained models and offers a concrete method to mitigate layer-limited bottlenecks.

Abstract

Large language models (LLMs) can solve complex multi-step problems, but little is known about how these computations are implemented internally. Motivated by this, we study how LLMs answer multi-hop queries such as "The spouse of the performer of Imagine is". These queries require two information extraction steps: a latent one for resolving the first hop ("the performer of Imagine") into the bridge entity (John Lennon), and another for resolving the second hop ("the spouse of John Lennon") into the target entity (Yoko Ono). Understanding how the latent step is computed internally is key to understanding the overall computation. By carefully analyzing the internal computations of transformer-based LLMs, we discover that the bridge entity is resolved in the early layers of the model. Then, only after this resolution, the two-hop query is solved in the later layers. Because the second hop commences in later layers, there could be cases where these layers no longer encode the necessary knowledge for correctly predicting the answer. Motivated by this, we propose a novel "back-patching" analysis method whereby a hidden representation from a later layer is patched back to an earlier layer. We find that in up to 66% of previously incorrect cases there exists a back-patch that results in the correct generation of the answer, showing that the later layers indeed sometimes lack the needed functionality. Overall, our methods and findings open further opportunities for understanding and improving latent reasoning in transformer-based LLMs.

Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries

TL;DR

We explore latent multi-hop reasoning in large language models by constructing a Wikidata-based dataset of two-hop queries and applying Patchscopes to localize where first-hop resolution occurs. The findings reveal a sequential computation: the first hop is resolved in early layers into the bridge entity, which then enables the second hop to be resolved in later layers with information propagating to the final token. To address failures, the authors propose back-patching, injecting later-layer representations into earlier layers, achieving 32–66% restoration of previously incorrect cases while preserving correct answers. The work provides a dataset, analysis tools, and a practical technique to diagnose and improve latent reasoning in transformers, with implications for understanding internal computation and guiding future enhancements. Overall, the study advances mechanistic insight into how multi-hop queries are handled in large pretrained models and offers a concrete method to mitigate layer-limited bottlenecks.

Abstract

Large language models (LLMs) can solve complex multi-step problems, but little is known about how these computations are implemented internally. Motivated by this, we study how LLMs answer multi-hop queries such as "The spouse of the performer of Imagine is". These queries require two information extraction steps: a latent one for resolving the first hop ("the performer of Imagine") into the bridge entity (John Lennon), and another for resolving the second hop ("the spouse of John Lennon") into the target entity (Yoko Ono). Understanding how the latent step is computed internally is key to understanding the overall computation. By carefully analyzing the internal computations of transformer-based LLMs, we discover that the bridge entity is resolved in the early layers of the model. Then, only after this resolution, the two-hop query is solved in the later layers. Because the second hop commences in later layers, there could be cases where these layers no longer encode the necessary knowledge for correctly predicting the answer. Motivated by this, we propose a novel "back-patching" analysis method whereby a hidden representation from a later layer is patched back to an earlier layer. We find that in up to 66% of previously incorrect cases there exists a back-patch that results in the correct generation of the answer, showing that the later layers indeed sometimes lack the needed functionality. Overall, our methods and findings open further opportunities for understanding and improving latent reasoning in transformer-based LLMs.
Paper Structure (28 sections, 11 figures, 5 tables)

This paper contains 28 sections, 11 figures, 5 tables.

Figures (11)

  • Figure 1: An illustration of our findings: we observe evidence of latent reasoning in two-hop queries where 1) During the early layers the first hop is resolved and the source entity Imagine now encodes the bridge entity John Lennon 2) During the middle layers critical information propagates to the last position 3) During the later layers the second hop is resolved and the last token now encodes the target entity Yoko Ono. We additionally illustrate back-patching: patching a hidden representation from a later layer back into an earlier layer in order to fix cases where the pathway fails.
  • Figure 2: Percentage of cases per layer where target entities were first successfully decoded using Patchscopes. The percentages are out of all correctly answered cases for LLaMA 2 13B.
  • Figure 3: Heat-map of the layers where Patchscopes successfully decodes $e_2$ from the position of $t_1$. The percentages are out of all successful decodings for LLaMA 2 13B run on correctly answered cases.
  • Figure 4: A comparison of the first layers of each stage in the pathway between correct and incorrect cases for LLaMA 3 8B.
  • Figure 5: Heat-map of the layers where back-patching succeeds. The percentages are out of all successful back-patching instances for LLaMA 2 13B.
  • ...and 6 more figures