Table of Contents
Fetching ...

Distilling Large Language Models for Text-Attributed Graph Learning

Bo Pan, Zheng Zhang, Yifei Zhang, Yuntong Hu, Liang Zhao

TL;DR

This work tackles the challenge of deploying powerful LLM-driven TAG learning under constraints of scalability, cost, and privacy by distilling LLM capabilities into a local graph model. It introduces an interpreter model that absorbs LLM rationales and pseudo-supervision, then guides a student graph model that operates without LLMs at test time. A semantics- and structure-aware alignment mechanism preserves text and graph information during distillation, enabling robust, LLM-free TAG predictions. Across four TAG datasets, the approach yields a 6.2% average improvement over baselines and demonstrates data efficiency and practical viability for privacy-preserving TAG learning.

Abstract

Text-Attributed Graphs (TAGs) are graphs of connected textual documents. Graph models can efficiently learn TAGs, but their training heavily relies on human-annotated labels, which are scarce or even unavailable in many applications. Large language models (LLMs) have recently demonstrated remarkable capabilities in few-shot and zero-shot TAG learning, but they suffer from scalability, cost, and privacy issues. Therefore, in this work, we focus on synergizing LLMs and graph models with their complementary strengths by distilling the power of LLMs to a local graph model on TAG learning. To address the inherent gaps between LLMs (generative models for texts) and graph models (discriminative models for graphs), we propose first to let LLMs teach an interpreter with rich textual rationale and then let a student model mimic the interpreter's reasoning without LLMs' textual rationale. Extensive experiments validate the efficacy of our proposed framework.

Distilling Large Language Models for Text-Attributed Graph Learning

TL;DR

This work tackles the challenge of deploying powerful LLM-driven TAG learning under constraints of scalability, cost, and privacy by distilling LLM capabilities into a local graph model. It introduces an interpreter model that absorbs LLM rationales and pseudo-supervision, then guides a student graph model that operates without LLMs at test time. A semantics- and structure-aware alignment mechanism preserves text and graph information during distillation, enabling robust, LLM-free TAG predictions. Across four TAG datasets, the approach yields a 6.2% average improvement over baselines and demonstrates data efficiency and practical viability for privacy-preserving TAG learning.

Abstract

Text-Attributed Graphs (TAGs) are graphs of connected textual documents. Graph models can efficiently learn TAGs, but their training heavily relies on human-annotated labels, which are scarce or even unavailable in many applications. Large language models (LLMs) have recently demonstrated remarkable capabilities in few-shot and zero-shot TAG learning, but they suffer from scalability, cost, and privacy issues. Therefore, in this work, we focus on synergizing LLMs and graph models with their complementary strengths by distilling the power of LLMs to a local graph model on TAG learning. To address the inherent gaps between LLMs (generative models for texts) and graph models (discriminative models for graphs), we propose first to let LLMs teach an interpreter with rich textual rationale and then let a student model mimic the interpreter's reasoning without LLMs' textual rationale. Extensive experiments validate the efficacy of our proposed framework.
Paper Structure (18 sections, 12 equations, 4 figures, 6 tables)

This paper contains 18 sections, 12 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Illustration of our proposed LLM to graph model knowledge distillation framework. (a) The general distillation framework. We propose to distill the knowledge of an LLM by leveraging the LLM-generated rationales and supervision to train an interpreter model, then align the student model on raw features with the interpreter model. (b) The training of the interpreter model with rationales and pseudo-supervision. The interpreter model takes rationales as input, including keywords, key edges, and key messages. LLM-generated pseudo supervision, including pseudo-label and soft labels, is used to train the interpreter model. (c) The proposed model alignment framework. The student model which takes original features as input is aligned with the interpreter model on text and graph levels based on the discrepancy between raw inputs and rationale-enhanced inputs.
  • Figure 2: Different approaches to incorporate rationales to train graph models.
  • Figure 3: Test accuracy on different proportions of available training data on four datasets. The x-axis represents the percentage of the training data, ranging from 0.001 to 0.6 of the dataset, plotted in the logarithm scale. The y-axis represents the test accuracy.
  • Figure 4: Sensitivity analysis of parameters $\lambda_1$, $\lambda_2$, $\lambda_3$ and $\lambda_4$. The red line denotes the baseline performance.