Table of Contents
Fetching ...

Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs

Taiqiang Wu, Zhe Zhao, Jiahao Wang, Xingyu Bai, Lei Wang, Ngai Wong, Yujiu Yang

TL;DR

This work tackles the latency gap between high-accuracy GNNs and fast MLPs by introducing Prototype-Guided Knowledge Distillation (PGKD), an edge-free method that imbues MLPs with graph structure awareness through class prototypes. By analyzing intra-class and inter-class graph edges, PGKD defines two prototype-based losses that mimic the effects of graph connectivity on GNNs, enabling structure-aware MLPs without using edge data during distillation. Across seven benchmarks and both transductive and inductive settings, PGKD consistently outperforms the edge-free GLNN baseline and often rivals or surpasses some GNN teachers, while exhibiting improved robustness to noisy node features. The results demonstrate the practical potential of edge-free, structure-aware distillation for scalable graph learning and suggest directions for extending prototype-guided approaches to other graph tasks.

Abstract

Distilling high-accuracy Graph Neural Networks (GNNs) to low-latency multilayer perceptions (MLPs) on graph tasks has become a hot research topic. However, conventional MLP learning relies almost exclusively on graph nodes and fails to effectively capture the graph structural information. Previous methods address this issue by processing graph edges into extra inputs for MLPs, but such graph structures may be unavailable for various scenarios. To this end, we propose Prototype-Guided Knowledge Distillation (PGKD), which does not require graph edges (edge-free setting) yet learns structure-aware MLPs. Our insight is to distill graph structural information from GNNs. Specifically, we first employ the class prototypes to analyze the impact of graph structures on GNN teachers, and then design two losses to distill such information from GNNs to MLPs. Experimental results on popular graph benchmarks demonstrate the effectiveness and robustness of the proposed PGKD.

Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs

TL;DR

This work tackles the latency gap between high-accuracy GNNs and fast MLPs by introducing Prototype-Guided Knowledge Distillation (PGKD), an edge-free method that imbues MLPs with graph structure awareness through class prototypes. By analyzing intra-class and inter-class graph edges, PGKD defines two prototype-based losses that mimic the effects of graph connectivity on GNNs, enabling structure-aware MLPs without using edge data during distillation. Across seven benchmarks and both transductive and inductive settings, PGKD consistently outperforms the edge-free GLNN baseline and often rivals or surpasses some GNN teachers, while exhibiting improved robustness to noisy node features. The results demonstrate the practical potential of edge-free, structure-aware distillation for scalable graph learning and suggest directions for extending prototype-guided approaches to other graph tasks.

Abstract

Distilling high-accuracy Graph Neural Networks (GNNs) to low-latency multilayer perceptions (MLPs) on graph tasks has become a hot research topic. However, conventional MLP learning relies almost exclusively on graph nodes and fails to effectively capture the graph structural information. Previous methods address this issue by processing graph edges into extra inputs for MLPs, but such graph structures may be unavailable for various scenarios. To this end, we propose Prototype-Guided Knowledge Distillation (PGKD), which does not require graph edges (edge-free setting) yet learns structure-aware MLPs. Our insight is to distill graph structural information from GNNs. Specifically, we first employ the class prototypes to analyze the impact of graph structures on GNN teachers, and then design two losses to distill such information from GNNs to MLPs. Experimental results on popular graph benchmarks demonstrate the effectiveness and robustness of the proposed PGKD.
Paper Structure (38 sections, 9 equations, 4 figures, 12 tables)

This paper contains 38 sections, 9 equations, 4 figures, 12 tables.

Figures (4)

  • Figure 1: Overview of the proposed PGKD. The input is the whole graph for the GNN teacher but only the corresponding graph nodes for the MLP student. The circles mean the vectors for graph nodes and the same color denotes the same class. After getting the class prototypes, we design inter-class and intra-class loss to distill the graph structural information from the GNN teacher to the MLP student.
  • Figure 2: The performance of GNN teacher, distilled MLP students via GLNN and PGKD when adding different noise to the initial node features. For GNN teachers, we select SAGE, GAT, GCN and APPNP, respectively. Upper: Cora dataset and transductive setting. Lower: Pubmed dataset and inductive setting. when adding noise into node features, PGKD gets little drop while GLNN drops a lot, showing the strong denoising ability of PGKD.
  • Figure 3: t-SNE visualization of node representations for GNN teacher, vanilla MLP, and distilled MLPs from GLNN and PGKD. We can see that PGKD can learn both class prototype distributions and same-class feature distributions well. Upper: Cora dataset. Lower: Citeseer dataset.
  • Figure 4: The performance of GNN teacher, distilled MLP students via GLNN and PGKD under inductive setting with different split ratios. Left: Citeseer dataset and SAGE as the GNN teacher. Right: Pubmed dataset and GCN as the GNN teacher.