Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path
Xinnan Dai, Qihao Wen, Yifei Shen, Hongzhi Wen, Dongsheng Li, Jiliang Tang, Caihua Shan
TL;DR
This paper critically re-evaluates the graph reasoning capabilities of large language models by focusing on three fundamental tasks—graph description translation, graph connectivity, and the shortest-path problem—across balanced synthetic datasets and real-world knowledge graphs. It systematically analyzes how graph description methods, connectivity types, and prompt strategies affect performance, revealing persistent failures when graphs are described purely in text and longer or more complex graphs. The authors demonstrate that Node List descriptions, meaningful node naming, and algorithm-guided prompts (e.g., BFS-CoT, Dijkstra-CoT) can boost reasoning, and that model scale and training data substantially improve outcomes, with GPT-4 typically outperforming GPT-3 and LLama variants. The findings offer concrete guidelines for dataset design, prompt construction, and model tuning to enhance graph reasoning in AI systems, while highlighting intrinsic limitations of text-only graph understanding.
Abstract
Large Language Models (LLMs) have achieved great success in various reasoning tasks. In this work, we focus on the graph reasoning ability of LLMs. Although theoretical studies proved that LLMs are capable of handling graph reasoning tasks, empirical evaluations reveal numerous failures. To deepen our understanding on this discrepancy, we revisit the ability of LLMs on three fundamental graph tasks: graph description translation, graph connectivity, and the shortest-path problem. Our findings suggest that LLMs can fail to understand graph structures through text descriptions and exhibit varying performance for all these three fundamental tasks. Meanwhile, we perform a real-world investigation on knowledge graphs and make consistent observations with our findings. The codes and datasets are available.
