Table of Contents
Fetching ...

LLM4DyG: Can Large Language Models Solve Spatial-Temporal Problems on Dynamic Graphs?

Zeyang Zhang, Xin Wang, Ziwei Zhang, Haoyang Li, Yijian Qin, Wenwu Zhu

TL;DR

This paper tackles the problem of assessing how well large language models can understand spatial-temporal information on dynamic graphs, a setting common in real-world web data. It introduces the LLM4DyG benchmark with nine tasks spanning temporal, spatial, and spatial-temporal queries and analyzes the effects of data generators, graph statistics, prompting strategies, and model choices. A key contribution is the Disentangled Spatial-Temporal Thoughts (DST2) prompting method, which improves performance by guiding LLMs to separately reason about temporal and structural aspects. The findings show that LLMs have preliminary spatial-temporal understanding on dynamic graphs, with performance influenced by graph density and size, and that DST2 can substantially boost reasoning abilities, offering a promising direction for integrating LLMs into dynamic-graph analyses in web applications.

Abstract

In an era marked by the increasing adoption of Large Language Models (LLMs) for various tasks, there is a growing focus on exploring LLMs' capabilities in handling web data, particularly graph data. Dynamic graphs, which capture temporal network evolution patterns, are ubiquitous in real-world web data. Evaluating LLMs' competence in understanding spatial-temporal information on dynamic graphs is essential for their adoption in web applications, which remains unexplored in the literature. In this paper, we bridge the gap via proposing to evaluate LLMs' spatial-temporal understanding abilities on dynamic graphs, to the best of our knowledge, for the first time. Specifically, we propose the LLM4DyG benchmark, which includes nine specially designed tasks considering the capability evaluation of LLMs from both temporal and spatial dimensions. Then, we conduct extensive experiments to analyze the impacts of different data generators, data statistics, prompting techniques, and LLMs on the model performance. Finally, we propose Disentangled Spatial-Temporal Thoughts (DST2) for LLMs on dynamic graphs to enhance LLMs' spatial-temporal understanding abilities. Our main observations are: 1) LLMs have preliminary spatial-temporal understanding abilities on dynamic graphs, 2) Dynamic graph tasks show increasing difficulties for LLMs as the graph size and density increase, while not sensitive to the time span and data generation mechanism, 3) the proposed DST2 prompting method can help to improve LLMs' spatial-temporal understanding abilities on dynamic graphs for most tasks. The data and codes are publicly available at Github.

LLM4DyG: Can Large Language Models Solve Spatial-Temporal Problems on Dynamic Graphs?

TL;DR

This paper tackles the problem of assessing how well large language models can understand spatial-temporal information on dynamic graphs, a setting common in real-world web data. It introduces the LLM4DyG benchmark with nine tasks spanning temporal, spatial, and spatial-temporal queries and analyzes the effects of data generators, graph statistics, prompting strategies, and model choices. A key contribution is the Disentangled Spatial-Temporal Thoughts (DST2) prompting method, which improves performance by guiding LLMs to separately reason about temporal and structural aspects. The findings show that LLMs have preliminary spatial-temporal understanding on dynamic graphs, with performance influenced by graph density and size, and that DST2 can substantially boost reasoning abilities, offering a promising direction for integrating LLMs into dynamic-graph analyses in web applications.

Abstract

In an era marked by the increasing adoption of Large Language Models (LLMs) for various tasks, there is a growing focus on exploring LLMs' capabilities in handling web data, particularly graph data. Dynamic graphs, which capture temporal network evolution patterns, are ubiquitous in real-world web data. Evaluating LLMs' competence in understanding spatial-temporal information on dynamic graphs is essential for their adoption in web applications, which remains unexplored in the literature. In this paper, we bridge the gap via proposing to evaluate LLMs' spatial-temporal understanding abilities on dynamic graphs, to the best of our knowledge, for the first time. Specifically, we propose the LLM4DyG benchmark, which includes nine specially designed tasks considering the capability evaluation of LLMs from both temporal and spatial dimensions. Then, we conduct extensive experiments to analyze the impacts of different data generators, data statistics, prompting techniques, and LLMs on the model performance. Finally, we propose Disentangled Spatial-Temporal Thoughts (DST2) for LLMs on dynamic graphs to enhance LLMs' spatial-temporal understanding abilities. Our main observations are: 1) LLMs have preliminary spatial-temporal understanding abilities on dynamic graphs, 2) Dynamic graph tasks show increasing difficulties for LLMs as the graph size and density increase, while not sensitive to the time span and data generation mechanism, 3) the proposed DST2 prompting method can help to improve LLMs' spatial-temporal understanding abilities on dynamic graphs for most tasks. The data and codes are publicly available at Github.
Paper Structure (31 sections, 5 figures, 12 tables)

This paper contains 31 sections, 5 figures, 12 tables.

Figures (5)

  • Figure 1: An overview of the tasks in the LLM4DyG Benchmark. The tasks are designed to consider both temporal and spatial dimensions, and question LLMs in natural language when, what or whether the spatial-temporal patterns take place. The spatial-temporal patterns range from temporal links, and chronological paths to dynamic triadic closure. The tasks are classified based on the targets of the queries. An example prompt and graph illustration are provided for each task.
  • Figure 2: An overview of the pipeline in the LLM4DyG Benchmark, which includes various dynamic graph generators, tasks, prompt methods, and LLMs for evaluation.
  • Figure 3: Performance comparisons (ACC%) on the dynamic graph tasks with different density $p$ and time span $T$. (Best viewed in color)
  • Figure 4: Performance of GPT-3.5 on the 'neighbor at time' task as the time span $T$ increases with different network sizes $N$. Note that when $T=1$, the data degenerates to a static graph, since there is only one timestamp on the graph.
  • Figure 5: Performance comparisons (ACC%) of various LLMs on the dynamic graph tasks. 'Random' denotes the random baseline which uniformly outputs one of the possible solutions. (Best viewed in color)