GAugLLM: Improving Graph Contrastive Learning for Text-Attributed Graphs with Large Language Models

Yi Fang; Dongzhe Fan; Daochen Zha; Qiaoyu Tan

GAugLLM: Improving Graph Contrastive Learning for Text-Attributed Graphs with Large Language Models

Yi Fang, Dongzhe Fan, Daochen Zha, Qiaoyu Tan

TL;DR

GAugLLM is introduced, a novel framework for augmenting TAGs that leverages advanced large language models like Mistral to enhance self-supervised graph learning and introduces a mixture-of-prompt-expert technique to generate augmented node features.

Abstract

This work studies self-supervised graph learning for text-attributed graphs (TAGs) where nodes are represented by textual attributes. Unlike traditional graph contrastive methods that perturb the numerical feature space and alter the graph's topological structure, we aim to improve view generation through language supervision. This is driven by the prevalence of textual attributes in real applications, which complement graph structures with rich semantic information. However, this presents challenges because of two major reasons. First, text attributes often vary in length and quality, making it difficulty to perturb raw text descriptions without altering their original semantic meanings. Second, although text attributes complement graph structures, they are not inherently well-aligned. To bridge the gap, we introduce GAugLLM, a novel framework for augmenting TAGs. It leverages advanced large language models like Mistral to enhance self-supervised graph learning. Specifically, we introduce a mixture-of-prompt-expert technique to generate augmented node features. This approach adaptively maps multiple prompt experts, each of which modifies raw text attributes using prompt engineering, into numerical feature space. Additionally, we devise a collaborative edge modifier to leverage structural and textual commonalities, enhancing edge augmentation by examining or building connections between nodes. Empirical results across five benchmark datasets spanning various domains underscore our framework's ability to enhance the performance of leading contrastive methods as a plug-in tool. Notably, we observe that the augmented features and graph structure can also enhance the performance of standard generative methods, as well as popular graph neural networks. The open-sourced implementation of our GAugLLM is available at Github.

GAugLLM: Improving Graph Contrastive Learning for Text-Attributed Graphs with Large Language Models

TL;DR

Abstract

Paper Structure (22 sections, 5 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 22 sections, 5 equations, 7 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Preliminary
Methodology
Mixture-of-Prompt-Experts
Prompt experts.
Text encoder.
Context-aware selector.
Collaborative Edge Modifier
Edge deletion.
Edge addition.
Graph Contrastive Learning for TAGs
Experiments
Experimental Setup
Overall Evaluation
...and 7 more sections

Figures (7)

Figure 1: The learning paradigm of GAugLLM vs. traditional GCL methods on TAGs. While standard GCL methodologies rely on text attributes primarily to generate numerical node features via shallow embedding models, such as word2vec, our GAugLLM endeavors to advance contrastive learning on graphs through advanced LLMs. This includes the direct perturbation of raw text attributes for feature augmentation, facilitated by a novel mixture-of-prompt-experts technique. Additionally, GAugLLM harnesses both structural and textual commonalities to effectively perturb edges deemed most spurious or likely to be connected, thereby enhancing structure augmentation.
Figure 2: The pipeline of the mixture-of-prompt-experts for feature augmentation. It takes a TAG as input and then utilizes multiple prompt experts to perturb the original text attributes, generating diverse augmented attributes. These augmented text attributes are then integrated into a unified augmentation feature by considering the graph statistics as attention context.
Figure 3: Ablation study of GAugLLM on the History dataset. "IDR", "SAR", and "SAS" denote scenarios where we only employ the corresponding prompt expert for feature augmentation. "Concat" means we directly aggregate the hidden representations of all prompt experts as the final output.
Figure 4: Ablation study of GAugLLM w.r.t. collaborative edge modifier on Photo dataset.
Figure 5: Sensitive analysis of GAugLLM w.r.t. the sampling ratio in collaborative edge modifier.
...and 2 more figures

GAugLLM: Improving Graph Contrastive Learning for Text-Attributed Graphs with Large Language Models

TL;DR

Abstract

GAugLLM: Improving Graph Contrastive Learning for Text-Attributed Graphs with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)