Efficient End-to-end Language Model Fine-tuning on Graphs

Rui Xue; Xipeng Shen; Ruozhou Yu; Xiaorui Liu

Efficient End-to-end Language Model Fine-tuning on Graphs

Rui Xue, Xipeng Shen, Ruozhou Yu, Xiaorui Liu

TL;DR

This study introduces LEADING, a novel and efficient approach for end-to-end fine-tuning of language models on TAGs, and demonstrates superior performance, achieving state-of-the-art (SOTA) results on the ogbn-arxiv leaderboard, while maintaining computation cost and memory overhead comparable to graph-less fine-tuning of LMs.

Abstract

Learning from Text-Attributed Graphs (TAGs) has attracted significant attention due to its wide range of real-world applications. The rapid evolution of language models (LMs) has revolutionized the way we process textual data, which indicates a strong potential to replace shallow text embedding generally used in Graph Neural Networks (GNNs). However, we find that existing LM approaches that exploit text information in graphs suffer from inferior computation and data efficiency. In this study, we introduce LEADING, a novel and efficient approach for end-to-end fine-tuning of language models on TAGs. To enhance data efficiency, LEADING efficiently transfers rich knowledge from LMs to downstream graph learning tasks with limited labeled data by employing end-to-end training of LMs and GNNs in a semi-supervised learning setting. To address associated computation efficiency issues, it introduces two techniques: neighbor decoupling targeting LMs and implicit graph modeling targeting GNNs, respectively. Our proposed approach demonstrates superior performance, achieving state-of-the-art (SOTA) results on the ogbn-arxiv leaderboard, while maintaining computation cost and memory overhead comparable to graph-less fine-tuning of LMs. Through comprehensive experiments, we showcase its superior computation and data efficiency, presenting a promising solution for various LMs and graph learning tasks on TAGs.

Efficient End-to-end Language Model Fine-tuning on Graphs

TL;DR

Abstract

Paper Structure (21 sections, 13 equations, 5 figures, 15 tables, 1 algorithm)

This paper contains 21 sections, 13 equations, 5 figures, 15 tables, 1 algorithm.

Introduction
Related Work
Methodology
Computation Redundancy in LM-GNN
LEADING in LMs: Neighbor Decoupling
LEADING in GNNs: Implicit Graph Modeling
Computation Complexity Analysis
Experiment
Prediction Performance
Efficiency Analysis
Scalability Comparison
Ablation Study
Conclusion
Proof of Eq.\ref{['eq:backward']}
GPT-2 Performance and Efficiency Analysis
...and 6 more sections

Figures (5)

Figure 1: Encoding Redundancy in Mini-batch GNNs.
Figure 2: LEADING: two-pipeline training process. (a) a randomly sampled batch is encoded in pipeline 2 and is stored in memory. (b) only the target nodes within the neighbor-sampled batch are encoded with gradients in pipeline 1. (c) Neighbor nodes' embeddings are retrieved and the resulting subgraph is then fed into GNNs. (d) Gradients from target nodes are employed to fine-tune the language models.
Figure 3: Convergence comparison
Figure 4: Memory Cost
Figure 5: Encoding Times

Efficient End-to-end Language Model Fine-tuning on Graphs

TL;DR

Abstract

Efficient End-to-end Language Model Fine-tuning on Graphs

Authors

TL;DR

Abstract

Table of Contents

Figures (5)