Table of Contents
Fetching ...

Unleashing the Potential of Text-attributed Graphs: Automatic Relation Decomposition via Large Language Models

Hyunjin Seo, Taewon Kim, June Yong Yang, Eunho Yang

TL;DR

This work identifies a fundamental limitation in text-attributed graphs: treating all edges as a single semantic relation obscures diverse contextual meanings embedded in node texts, which can hinder GNN learning. It introduces RoSE, a two-stage framework that uses LLMs to automatically identify meaningful semantic relation types and decompose edges accordingly, enabling seamless integration with both multi-relational and edge-featured GNNs. Empirical results across seven TAG benchmarks show consistent improvements in node classification, with notable gains on datasets where baseline GNNs underperform, and larger LLMs delivering additional benefits. RoSE thus provides a scalable, automated method to enrich graph structure and enhance downstream performance in real-world applications.

Abstract

Recent advancements in text-attributed graphs (TAGs) have significantly improved the quality of node features by using the textual modeling capabilities of language models. Despite this success, utilizing text attributes to enhance the predefined graph structure remains largely unexplored. Our extensive analysis reveals that conventional edges on TAGs, treated as a single relation (e.g., hyperlinks) in previous literature, actually encompass mixed semantics (e.g., "advised by" and "participates in"). This simplification hinders the representation learning process of Graph Neural Networks (GNNs) on downstream tasks, even when integrated with advanced node features. In contrast, we discover that decomposing these edges into distinct semantic relations significantly enhances the performance of GNNs. Despite this, manually identifying and labeling of edges to corresponding semantic relations is labor-intensive, often requiring domain expertise. To this end, we introduce RoSE (Relation-oriented Semantic Edge-decomposition), a novel framework that leverages the capability of Large Language Models (LLMs) to decompose the graph structure by analyzing raw text attributes - in a fully automated manner. RoSE operates in two stages: (1) identifying meaningful relations using an LLM-based generator and discriminator, and (2) categorizing each edge into corresponding relations by analyzing textual contents associated with connected nodes via an LLM-based decomposer. Extensive experiments demonstrate that our model-agnostic framework significantly enhances node classification performance across various datasets, with improvements of up to 16% on the Wisconsin dataset.

Unleashing the Potential of Text-attributed Graphs: Automatic Relation Decomposition via Large Language Models

TL;DR

This work identifies a fundamental limitation in text-attributed graphs: treating all edges as a single semantic relation obscures diverse contextual meanings embedded in node texts, which can hinder GNN learning. It introduces RoSE, a two-stage framework that uses LLMs to automatically identify meaningful semantic relation types and decompose edges accordingly, enabling seamless integration with both multi-relational and edge-featured GNNs. Empirical results across seven TAG benchmarks show consistent improvements in node classification, with notable gains on datasets where baseline GNNs underperform, and larger LLMs delivering additional benefits. RoSE thus provides a scalable, automated method to enrich graph structure and enhance downstream performance in real-world applications.

Abstract

Recent advancements in text-attributed graphs (TAGs) have significantly improved the quality of node features by using the textual modeling capabilities of language models. Despite this success, utilizing text attributes to enhance the predefined graph structure remains largely unexplored. Our extensive analysis reveals that conventional edges on TAGs, treated as a single relation (e.g., hyperlinks) in previous literature, actually encompass mixed semantics (e.g., "advised by" and "participates in"). This simplification hinders the representation learning process of Graph Neural Networks (GNNs) on downstream tasks, even when integrated with advanced node features. In contrast, we discover that decomposing these edges into distinct semantic relations significantly enhances the performance of GNNs. Despite this, manually identifying and labeling of edges to corresponding semantic relations is labor-intensive, often requiring domain expertise. To this end, we introduce RoSE (Relation-oriented Semantic Edge-decomposition), a novel framework that leverages the capability of Large Language Models (LLMs) to decompose the graph structure by analyzing raw text attributes - in a fully automated manner. RoSE operates in two stages: (1) identifying meaningful relations using an LLM-based generator and discriminator, and (2) categorizing each edge into corresponding relations by analyzing textual contents associated with connected nodes via an LLM-based decomposer. Extensive experiments demonstrate that our model-agnostic framework significantly enhances node classification performance across various datasets, with improvements of up to 16% on the Wisconsin dataset.
Paper Structure (38 sections, 4 equations, 6 figures, 9 tables, 1 algorithm)

This paper contains 38 sections, 4 equations, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overall framework of RoSE.
  • Figure 2: Sensitivity to temperature when prompting relation decomposer. Varied temperature (0.2 - 0.8) is denoted on the x-axis, while node classification accuracy(%) is denoted on the y-axis. Red, yellow and brown each denote RoSE (LLaMA3-70b), RoSE (LLaMA3-8b), and vanilla GNNs (RGCN and GIN), respectively.
  • Figure 3: UMAP visualization analysis between raw features and representations of RGCN trained with single and multiple types of relations.
  • Figure 4: UMAP visualization analysis between raw features and representations of HAN trained with single and multiple types of relations.
  • Figure 5: Comparison of average inter-prototype similarity (i.e., average cosine similarity between per-class mean representation vectors) between raw features and representations of GNNs trained with single and multiple types of relations.
  • ...and 1 more figures