How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension
Xinnan Dai, Haohao Qu, Yifen Shen, Bohang Zhang, Qihao Wen, Wenqi Fan, Dongsheng Li, Jiliang Tang, Caihua Shan
TL;DR
This work investigates whether large language models can understand and reason about graph patterns by introducing a comprehensive benchmark that spans terminology-based, topology-based, and data-driven descriptions. The evaluation covers 11 tasks across 7 models using synthetic and real-world graphs, with input formats in adjacency-list and edge-list representations, and includes tasks such as pattern translation, isomorphic mapping, graph modification, pattern detection, dense-subgraph mining, frequent subgraph extraction, and discriminative pattern learning. Key findings reveal that LLMs possess preliminary capabilities to understand graph patterns, with O1-mini frequently delivering the strongest performance, and that formatting inputs to align with pretraining improves results; however, strategies differ from conventional graph algorithms and hallucinations occur in some cases. The benchmark provides a scalable, extensible framework to probe graph-pattern reasoning in LLMs and informs prompting and architectural design for graph-centric AI systems, supporting progress toward more reliable graph-aware reasoning in real-world applications.
Abstract
Benchmarking the capabilities and limitations of large language models (LLMs) in graph-related tasks is becoming an increasingly popular and crucial area of research. Recent studies have shown that LLMs exhibit a preliminary ability to understand graph structures and node features. However, the potential of LLMs in graph pattern mining remains largely unexplored. This is a key component in fields such as computational chemistry, biology, and social network analysis. To bridge this gap, this work introduces a comprehensive benchmark to assess LLMs' capabilities in graph pattern tasks. We have developed a benchmark that evaluates whether LLMs can understand graph patterns based on either terminological or topological descriptions. Additionally, our benchmark tests the LLMs' capacity to autonomously discover graph patterns from data. The benchmark encompasses both synthetic and real datasets, and a variety of models, with a total of 11 tasks and 7 models. Our experimental framework is designed for easy expansion to accommodate new models and datasets. Our findings reveal that: (1) LLMs have preliminary abilities to understand graph patterns, with O1-mini outperforming in the majority of tasks; (2) Formatting input data to align with the knowledge acquired during pretraining can enhance performance; (3) The strategies employed by LLMs may differ from those used in conventional algorithms.
