Microstructures and Accuracy of Graph Recall by Large Language Models

Yanbang Wang; Hejie Cui; Jon Kleinberg

Microstructures and Accuracy of Graph Recall by Large Language Models

Yanbang Wang, Hejie Cui, Jon Kleinberg

TL;DR

This work performs the first systematical study of graph recall by LLMs, investigating the accuracy and biased microstructures (local structural patterns) in their recall and finds that more advanced LLMs have a striking dependence on the domain that a real-world graph comes from.

Abstract

Graphs data is crucial for many applications, and much of it exists in the relations described in textual format. As a result, being able to accurately recall and encode a graph described in earlier text is a basic yet pivotal ability that LLMs need to demonstrate if they are to perform reasoning tasks that involve graph-structured information. Human performance at graph recall has been studied by cognitive scientists for decades, and has been found to often exhibit certain structural patterns of bias that align with human handling of social relationships. To date, however, we know little about how LLMs behave in analogous graph recall tasks: do their recalled graphs also exhibit certain biased patterns, and if so, how do they compare with humans and affect other graph reasoning tasks? In this work, we perform the first systematical study of graph recall by LLMs, investigating the accuracy and biased microstructures (local structural patterns) in their recall. We find that LLMs not only underperform often in graph recall, but also tend to favor more triangles and alternating 2-paths. Moreover, we find that more advanced LLMs have a striking dependence on the domain that a real-world graph comes from -- by yielding the best recall accuracy when the graph is narrated in a language style consistent with its original domain.

Microstructures and Accuracy of Graph Recall by Large Language Models

TL;DR

Abstract

Paper Structure (28 sections, 2 equations, 9 figures, 7 tables)

This paper contains 28 sections, 2 equations, 9 figures, 7 tables.

Introduction
Preliminaries
Exponential Random Graph Model (ERGM)
Memory Clearance
Microstructures and Accuracy of Graph Recall by LLMs
Experimental Protocols and Datasets
Results and Analysis
LLMs Compared with Humans in Graph Recall
What Affects LLM's Graph Recall?
Narrative Style
Strength of Memory Clearance
Correlation between LLM's Graph Recall and Link Prediction
Result Analysis.
What to Inform about Future Research: an Empirical Perspective
Related Work
...and 13 more sections

Figures (9)

Figure 1: Graph recall is a simple task but also a crucial pivot for other graph reasoning tasks.
Figure 2: Experimental protocols for analyzing microstructures and accuracy of LLM's graph recall. See Sec.\ref{['subsec:protocols']} for detailed explanations.
Figure 3: Different factors that influence LLM's graph recall. (a) - (c): narrative styles. The heatmaps show that more advanced LLMs like GPT-4 yield best recall accuracy when the graph is narrated in a language style consistent with its original domain. (d) - (f): memory clearance. Gemini-Pro appears more sensitive to small noise in context, while GPT's are more robust.
Figure 4: Correlation between GPT-3.5's performance at graph recall ($y$) and link prediction ($x$).
Figure 5: Graph samples of the Facebook dataset.
...and 4 more figures

Microstructures and Accuracy of Graph Recall by Large Language Models

TL;DR

Abstract

Microstructures and Accuracy of Graph Recall by Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (9)