Table of Contents
Fetching ...

Preference-driven Knowledge Distillation for Few-shot Node Classification

Xing Wei, Chunchun Chen, Rui Fan, Xiaofeng Cao, Sourav Medya, Wei Ye

TL;DR

This work tackles the challenge of few-shot node classification on text-attributed graphs by uniting large language models (LLMs) with diverse graph neural networks (GNNs) through a novel preference-driven knowledge distillation (PKD) framework. PKD introduces two specialized selectors: GNN-preference-driven Node Selector (GNS) to identify LLM-annotated nodes that most benefit teacher GNNs, and Node-preference-driven GNN Selector (NGS) to tailor per-node teacher choices via reinforcement learning. A Graph Topology Aware (GTA) prompting strategy enhances LLM understanding of graph structure, while a Distance-based Neighbor Selection (DNS) enriches prompts with robust neighborhood information. Through extensive experiments on nine real-world TAGs and multiple LLM variants, PKD achieves state-of-the-art or competitive performance in zero-/few-shot settings, highlighting the benefits of combining LLM reasoning with targeted, node-specific distillation from multiple GNN teachers. The approach advances scalable, label-efficient learning for complex graph-structured text data and points to future work on efficiency and broader applicability beyond TAGs.

Abstract

Graph neural networks (GNNs) can efficiently process text-attributed graphs (TAGs) due to their message-passing mechanisms, but their training heavily relies on the human-annotated labels. Moreover, the complex and diverse local topologies of nodes of real-world TAGs make it challenging for a single mechanism to handle. Large language models (LLMs) perform well in zero-/few-shot learning on TAGs but suffer from a scalability challenge. Therefore, we propose a preference-driven knowledge distillation (PKD) framework to synergize the complementary strengths of LLMs and various GNNs for few-shot node classification. Specifically, we develop a GNN-preference-driven node selector that effectively promotes prediction distillation from LLMs to teacher GNNs. To further tackle nodes' intricate local topologies, we develop a node-preference-driven GNN selector that identifies the most suitable teacher GNN for each node, thereby facilitating tailored knowledge distillation from teacher GNNs to the student GNN. Extensive experiments validate the efficacy of our proposed framework in few-shot node classification on real-world TAGs. Our code is be available.

Preference-driven Knowledge Distillation for Few-shot Node Classification

TL;DR

This work tackles the challenge of few-shot node classification on text-attributed graphs by uniting large language models (LLMs) with diverse graph neural networks (GNNs) through a novel preference-driven knowledge distillation (PKD) framework. PKD introduces two specialized selectors: GNN-preference-driven Node Selector (GNS) to identify LLM-annotated nodes that most benefit teacher GNNs, and Node-preference-driven GNN Selector (NGS) to tailor per-node teacher choices via reinforcement learning. A Graph Topology Aware (GTA) prompting strategy enhances LLM understanding of graph structure, while a Distance-based Neighbor Selection (DNS) enriches prompts with robust neighborhood information. Through extensive experiments on nine real-world TAGs and multiple LLM variants, PKD achieves state-of-the-art or competitive performance in zero-/few-shot settings, highlighting the benefits of combining LLM reasoning with targeted, node-specific distillation from multiple GNN teachers. The approach advances scalable, label-efficient learning for complex graph-structured text data and points to future work on efficiency and broader applicability beyond TAGs.

Abstract

Graph neural networks (GNNs) can efficiently process text-attributed graphs (TAGs) due to their message-passing mechanisms, but their training heavily relies on the human-annotated labels. Moreover, the complex and diverse local topologies of nodes of real-world TAGs make it challenging for a single mechanism to handle. Large language models (LLMs) perform well in zero-/few-shot learning on TAGs but suffer from a scalability challenge. Therefore, we propose a preference-driven knowledge distillation (PKD) framework to synergize the complementary strengths of LLMs and various GNNs for few-shot node classification. Specifically, we develop a GNN-preference-driven node selector that effectively promotes prediction distillation from LLMs to teacher GNNs. To further tackle nodes' intricate local topologies, we develop a node-preference-driven GNN selector that identifies the most suitable teacher GNN for each node, thereby facilitating tailored knowledge distillation from teacher GNNs to the student GNN. Extensive experiments validate the efficacy of our proposed framework in few-shot node classification on real-world TAGs. Our code is be available.

Paper Structure

This paper contains 35 sections, 1 theorem, 22 equations, 7 figures, 18 tables.

Key Result

Proposition 3.1

These nodes with higher $K$-uncertainty ($\delta_{K}$) are beneficial for GNNs enhancement. where $\delta_{v}$ is the uncertainty of node $v$, is defined as $\frac{1}{B}\sum_{i=1}^{B}D_{KL}(f_{T_{i}}^{\theta}(v)||\mathcal{M}(v))$. The $\mathcal{M}(v)$ is the average prediction probability distribution of all $B$ teacher GNNs (See Definition def_m for details). $D_{KL}(\cdot||\cdot)$ is the where

Figures (7)

  • Figure 1: Overview of PKD. The framework has two key modules: GNN-preference-driven Node Selector (GNS) and Node-preference-driven GNN Selector (NGS). Before starting GNS, we first fine-tune the LLM with GTA prompts to enable it to comprehend graph properties. In the GNS module, we exploit the proposed $K$-uncertainty based on the node prediction uncertainty in each teacher GNN's embedding space to select nodes. For effectively exploiting the LLM to annotate those selected nodes, we combine the semantic attributes and structure attributes derived from the proposed Distance-based Neighbor Selector (DNS) module on these nodes to construct prompt, promoting the prediction distillation from the fine-tuned LLM to teacher GNNs ($T_{1}, T_{2},\ldots,T_{B}$). In the NGS module, we select for each node the most appropriate teacher GNN for tailored knowledge distillation. The teacher GNN selection is achieved by reinforcement learning with the fine-tuned LLM as agent.
  • Figure 2: The performance improvements in zero-shot node classification on homophily and heterophily graphs.
  • Figure 3: This is exemplified using the Cora dataset. Starting from the arrow and progressing counterclockwise, the KL divergence sum gradually increases, accompanied by a darkening of the triangle colors. The length of each triangle indicates the number of nodes within a specific KL divergence sum range, where $N_{P}$ denotes the number of classes predicted by the teacher GNNs.
  • Figure 4: The comparison of different Rewards. When including all three parts simultaneously, our method (the curve in green) performs the best.
  • Figure 5: The effects of $K$ on homophily and heterophily graphs. When $K = 4$, zero-shot node classification accuracy of the fine-tuned LLM is the highest on most graphs.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Proposition 3.1
  • proof
  • Definition D.1