Table of Contents
Fetching ...

Exploring the Potential of Large Language Models in Graph Generation

Yang Yao, Xin Wang, Zeyang Zhang, Yijian Qin, Ziwei Zhang, Xu Chu, Yuekui Yang, Wenwu Zhu, Hong Mei

TL;DR

This paper proposes LLM4GraphGen to explore the ability of LLMs for graph generation with systematical task designs and extensive experiments, and demonstrates that LLMs, particularly GPT-4, exhibit preliminary abilities in graph generation tasks, including rule-based and distribution-based generation.

Abstract

Large language models (LLMs) have achieved great success in many fields, and recent works have studied exploring LLMs for graph discriminative tasks such as node classification. However, the abilities of LLMs for graph generation remain unexplored in the literature. Graph generation requires the LLM to generate graphs with given properties, which has valuable real-world applications such as drug discovery, while tends to be more challenging. In this paper, we propose LLM4GraphGen to explore the ability of LLMs for graph generation with systematical task designs and extensive experiments. Specifically, we propose several tasks tailored with comprehensive experiments to address key questions regarding LLMs' understanding of different graph structure rules, their ability to capture structural type distributions, and their utilization of domain knowledge for property-based graph generation. Our evaluations demonstrate that LLMs, particularly GPT-4, exhibit preliminary abilities in graph generation tasks, including rule-based and distribution-based generation. We also observe that popular prompting methods, such as few-shot and chain-of-thought prompting, do not consistently enhance performance. Besides, LLMs show potential in generating molecules with specific properties. These findings may serve as foundations for designing good LLMs based models for graph generation and provide valuable insights and further research.

Exploring the Potential of Large Language Models in Graph Generation

TL;DR

This paper proposes LLM4GraphGen to explore the ability of LLMs for graph generation with systematical task designs and extensive experiments, and demonstrates that LLMs, particularly GPT-4, exhibit preliminary abilities in graph generation tasks, including rule-based and distribution-based generation.

Abstract

Large language models (LLMs) have achieved great success in many fields, and recent works have studied exploring LLMs for graph discriminative tasks such as node classification. However, the abilities of LLMs for graph generation remain unexplored in the literature. Graph generation requires the LLM to generate graphs with given properties, which has valuable real-world applications such as drug discovery, while tends to be more challenging. In this paper, we propose LLM4GraphGen to explore the ability of LLMs for graph generation with systematical task designs and extensive experiments. Specifically, we propose several tasks tailored with comprehensive experiments to address key questions regarding LLMs' understanding of different graph structure rules, their ability to capture structural type distributions, and their utilization of domain knowledge for property-based graph generation. Our evaluations demonstrate that LLMs, particularly GPT-4, exhibit preliminary abilities in graph generation tasks, including rule-based and distribution-based generation. We also observe that popular prompting methods, such as few-shot and chain-of-thought prompting, do not consistently enhance performance. Besides, LLMs show potential in generating molecules with specific properties. These findings may serve as foundations for designing good LLMs based models for graph generation and provide valuable insights and further research.
Paper Structure (26 sections, 4 equations, 5 figures, 18 tables)

This paper contains 26 sections, 4 equations, 5 figures, 18 tables.

Figures (5)

  • Figure 1: An overview of LLM4GraphGen. Our proposed method designs a prompt tailored to each graph generation task, which is subsequently used as the input to the LLM to generate the desired graphs. Each prompt encompasses both the task description and the required output format. In the case of rule-based generation, the prompt contains the description of the rule. For distribution-based generation, a collection of graphs is provided to facilitate the LLM's learning of the underlying distribution. For property-based generation, a collection of molecules is included to enable the LLM to understand molecular properties.
  • Figure 2: An illustration of graphs with regard to different rules.
  • Figure 3: An illustration of distribution-based graph generation.
  • Figure 4: An illustration of property-based graph generation.
  • Figure 5: Performance of GPT-4 on distribution-based graph generation. $p$ represents the parameter of the distribution where the graphs are sampled. $p_{\text{pred}}$ is the value of $p$ predicted by LLM for the input graphs. $p_{\text{gen}}$ is the value of $p$ calculated by the generated graphs. In this task, the performance of LLMs is better when $p_{\text{pred}}$ and $p_{\text{gen}}$ are closer to the ground-truth parameter $p$.