Table of Contents
Fetching ...

Lost-in-Distance: Impact of Contextual Proximity on LLM Performance in Graph Tasks

Hamed Firooz, Maziar Sanjabi, Wenlong Jiang, Xiaoling Zhai

TL;DR

This work reveals a previously underappreciated interaction between contextual proximity and cross-subgraph reasoning in LLMs, introducing the lost-in-distance phenomenon in graph tasks. By evaluating edge existence, common connection, and similarity tasks with three encodings across multiple LLMs, the study shows that accuracy deteriorates as the distance between relevant information increases, and this effect compounds with the known lost-in-the-middle bias. The authors formalize a distance-aware model $F(p_1,p_2) = γ G(p_1) G(p_2) H(d)$ to separate middle and distance effects, demonstrating a superior fit and a distance-dependent decline (e.g., up to several-fold) that is robust across graph densities and encodings. These findings highlight fundamental limits of current LLMs in graph reasoning and motivate improved graph representations and prompting strategies for practical domains like recommendation, molecular design, and multi-hop reasoning.

Abstract

Despite significant advancements, Large Language Models (LLMs) exhibit blind spots that impair their ability to retrieve and process relevant contextual data effectively. We demonstrate that LLM performance in graph tasks with complexities beyond the "needle-in-a-haystack" scenario-where solving the problem requires cross-referencing and reasoning across multiple subproblems jointly-is influenced by the proximity of relevant information within the context, a phenomenon we term "lost-in-distance". We examine two fundamental graph tasks: identifying common connections between two nodes and assessing similarity among three nodes, and show that the model's performance in these tasks significantly depends on the relative positioning of common edges. We evaluate three publicly available LLMs using various graph encoding techniques that represent graph structures for LLM input. We propose a formulation for the lost-in-distance phenomenon and demonstrate that lost-in-distance and lost-in-the middle phenomenas occur independently. Results indicate that model accuracy can decline by up to 6x as the distance between node connections increases, independent of graph encoding and model size.

Lost-in-Distance: Impact of Contextual Proximity on LLM Performance in Graph Tasks

TL;DR

This work reveals a previously underappreciated interaction between contextual proximity and cross-subgraph reasoning in LLMs, introducing the lost-in-distance phenomenon in graph tasks. By evaluating edge existence, common connection, and similarity tasks with three encodings across multiple LLMs, the study shows that accuracy deteriorates as the distance between relevant information increases, and this effect compounds with the known lost-in-the-middle bias. The authors formalize a distance-aware model to separate middle and distance effects, demonstrating a superior fit and a distance-dependent decline (e.g., up to several-fold) that is robust across graph densities and encodings. These findings highlight fundamental limits of current LLMs in graph reasoning and motivate improved graph representations and prompting strategies for practical domains like recommendation, molecular design, and multi-hop reasoning.

Abstract

Despite significant advancements, Large Language Models (LLMs) exhibit blind spots that impair their ability to retrieve and process relevant contextual data effectively. We demonstrate that LLM performance in graph tasks with complexities beyond the "needle-in-a-haystack" scenario-where solving the problem requires cross-referencing and reasoning across multiple subproblems jointly-is influenced by the proximity of relevant information within the context, a phenomenon we term "lost-in-distance". We examine two fundamental graph tasks: identifying common connections between two nodes and assessing similarity among three nodes, and show that the model's performance in these tasks significantly depends on the relative positioning of common edges. We evaluate three publicly available LLMs using various graph encoding techniques that represent graph structures for LLM input. We propose a formulation for the lost-in-distance phenomenon and demonstrate that lost-in-distance and lost-in-the middle phenomenas occur independently. Results indicate that model accuracy can decline by up to 6x as the distance between node connections increases, independent of graph encoding and model size.
Paper Structure (25 sections, 5 equations, 13 figures, 5 tables)

This paper contains 25 sections, 5 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: Three graph encoding functions, with node $0$ and node $1$ serving as the nodes of interest. The figure is inspired by fatemi2024talk.
  • Figure 2: Example of the edge existence task, illustrating the placement of the nodes of interest subgraph (nodes $208$ and $358$) at (a) the beginning, (b) the middle, and (c) the end of the graph structure.
  • Figure 3: The effect of the position of the relevant information on the edge existence task.
  • Figure 4: An example illustrating the placement of relevant information, highlighted in blue and red, at different positions using the adjacency encoding function for the common connection task. Relevant information is grouped at positions $0$, $1$, or $2$ within the first node's (node $257$) subgraph and at positions $3$, $4$, or $5$ within the second node's (node $462$) subgraph. The left plot depicts the smallest distance between relevant information, while the right plot shows the largest distance.
  • Figure 5: The effect of lost-in-distance on the common connection task. The number in each block is accuracy $\pm$ standard deviation.
  • ...and 8 more figures