Table of Contents
Fetching ...

Can LLMs Convert Graphs to Text-Attributed Graphs?

Zehong Wang, Sidney Liu, Zheyuan Zhang, Tianyi Ma, Chuxu Zhang, Yanfang Ye

TL;DR

GNNs struggle with cross-graph learning when node feature spaces differ. The authors introduce Topology-Aware Node Description Synthesis (TANS), which uses topology-guided prompts to large language models to generate node descriptions, effectively converting arbitrary graphs into text-attributed graphs. The approach supports text-rich, text-limited, and text-free scenarios, and a four-step pipeline enables robust feature alignment across graphs, with strong results on node classification and transfer tasks. Empirically, TANS outperforms baselines in single-graph learning, domain adaptation, and transfer settings, highlighting the potential of LLM-assisted graph preprocessing to broaden GNN applicability. The work includes open-source code and demonstrates practical impact for cross-graph learning without requiring manual text data.

Abstract

Graphs are ubiquitous structures found in numerous real-world applications, such as drug discovery, recommender systems, and social network analysis. To model graph-structured data, graph neural networks (GNNs) have become a popular tool. However, existing GNN architectures encounter challenges in cross-graph learning where multiple graphs have different feature spaces. To address this, recent approaches introduce text-attributed graphs (TAGs), where each node is associated with a textual description, which can be projected into a unified feature space using textual encoders. While promising, this method relies heavily on the availability of text-attributed graph data, which is difficult to obtain in practice. To bridge this gap, we propose a novel method named Topology-Aware Node description Synthesis (TANS), leveraging large language models (LLMs) to convert existing graphs into text-attributed graphs. The key idea is to integrate topological information into LLMs to explain how graph topology influences node semantics. We evaluate our TANS on text-rich, text-limited, and text-free graphs, demonstrating its applicability. Notably, on text-free graphs, our method significantly outperforms existing approaches that manually design node features, showcasing the potential of LLMs for preprocessing graph-structured data in the absence of textual information. The code and data are available at https://github.com/Zehong-Wang/TANS.

Can LLMs Convert Graphs to Text-Attributed Graphs?

TL;DR

GNNs struggle with cross-graph learning when node feature spaces differ. The authors introduce Topology-Aware Node Description Synthesis (TANS), which uses topology-guided prompts to large language models to generate node descriptions, effectively converting arbitrary graphs into text-attributed graphs. The approach supports text-rich, text-limited, and text-free scenarios, and a four-step pipeline enables robust feature alignment across graphs, with strong results on node classification and transfer tasks. Empirically, TANS outperforms baselines in single-graph learning, domain adaptation, and transfer settings, highlighting the potential of LLM-assisted graph preprocessing to broaden GNN applicability. The work includes open-source code and demonstrates practical impact for cross-graph learning without requiring manual text data.

Abstract

Graphs are ubiquitous structures found in numerous real-world applications, such as drug discovery, recommender systems, and social network analysis. To model graph-structured data, graph neural networks (GNNs) have become a popular tool. However, existing GNN architectures encounter challenges in cross-graph learning where multiple graphs have different feature spaces. To address this, recent approaches introduce text-attributed graphs (TAGs), where each node is associated with a textual description, which can be projected into a unified feature space using textual encoders. While promising, this method relies heavily on the availability of text-attributed graph data, which is difficult to obtain in practice. To bridge this gap, we propose a novel method named Topology-Aware Node description Synthesis (TANS), leveraging large language models (LLMs) to convert existing graphs into text-attributed graphs. The key idea is to integrate topological information into LLMs to explain how graph topology influences node semantics. We evaluate our TANS on text-rich, text-limited, and text-free graphs, demonstrating its applicability. Notably, on text-free graphs, our method significantly outperforms existing approaches that manually design node features, showcasing the potential of LLMs for preprocessing graph-structured data in the absence of textual information. The code and data are available at https://github.com/Zehong-Wang/TANS.

Paper Structure

This paper contains 28 sections, 5 equations, 2 figures, 23 tables.

Figures (2)

  • Figure 1: (a) A single GNN model struggles to handle graphs with different feature spaces. (b) Using a textual encoder to align feature spaces across text-attributed graphs (TAGs) facilitates cross-graph learning. (c) However, collecting TAGs is often highly challenging in practice. In this paper, we propose a method to overcome this limitation by automatically generating textual descriptions for nodes in the graph.
  • Figure 2: The framework of our topology-aware node description synthesis (TANS).