Table of Contents
Fetching ...

Heuristic Methods are Good Teachers to Distill MLPs for Graph Link Prediction

Zongyue Qin, Shichang Zhang, Mingxuan Ju, Tong Zhao, Neil Shah, Yizhou Sun

TL;DR

This work investigates GNN-to-MLP distillation for graph link prediction, revealing that stronger teachers do not always yield better MLP students due to capacity and feature-alignment constraints. It formalizes teachable knowledge and shows that heuristics, despite lower standalone accuracy, can provide complementary, learnable signals that improve MLP performance with far reduced training cost. Building on this, the authors propose Ensemble Heuristic-Distilled MLPs (EHDM), which gates multiple heuristic-derived MLPs using a feature-based gating network to produce a graph-free ensemble. Across ten datasets, EHDM achieves an average 7.93% improvement over prior LLP approaches while reducing training time by 1.95–3.32×, demonstrating practical, scalable link prediction with minimal graph dependency.

Abstract

Link prediction is a crucial graph-learning task with applications including citation prediction and product recommendation. Distilling Graph Neural Networks (GNNs) teachers into Multi-Layer Perceptrons (MLPs) students has emerged as an effective approach to achieve strong performance and reducing computational cost by removing graph dependency. However, existing distillation methods only use standard GNNs and overlook alternative teachers such as specialized model for link prediction (GNN4LP) and heuristic methods (e.g., common neighbors). This paper first explores the impact of different teachers in GNN-to-MLP distillation. Surprisingly, we find that stronger teachers do not always produce stronger students: MLPs distilled from GNN4LP can underperform those distilled from simpler GNNs, while weaker heuristic methods can teach MLPs to near-GNN performance with drastically reduced training costs. Building on these insights, we propose Ensemble Heuristic-Distilled MLPs (EHDM), which eliminates graph dependencies while effectively integrating complementary signals via a gating mechanism. Experiments on ten datasets show an average 7.93% improvement over previous GNN-to-MLP approaches with 1.95-3.32 times less training time, indicating EHDM is an efficient and effective link prediction method.

Heuristic Methods are Good Teachers to Distill MLPs for Graph Link Prediction

TL;DR

This work investigates GNN-to-MLP distillation for graph link prediction, revealing that stronger teachers do not always yield better MLP students due to capacity and feature-alignment constraints. It formalizes teachable knowledge and shows that heuristics, despite lower standalone accuracy, can provide complementary, learnable signals that improve MLP performance with far reduced training cost. Building on this, the authors propose Ensemble Heuristic-Distilled MLPs (EHDM), which gates multiple heuristic-derived MLPs using a feature-based gating network to produce a graph-free ensemble. Across ten datasets, EHDM achieves an average 7.93% improvement over prior LLP approaches while reducing training time by 1.95–3.32×, demonstrating practical, scalable link prediction with minimal graph dependency.

Abstract

Link prediction is a crucial graph-learning task with applications including citation prediction and product recommendation. Distilling Graph Neural Networks (GNNs) teachers into Multi-Layer Perceptrons (MLPs) students has emerged as an effective approach to achieve strong performance and reducing computational cost by removing graph dependency. However, existing distillation methods only use standard GNNs and overlook alternative teachers such as specialized model for link prediction (GNN4LP) and heuristic methods (e.g., common neighbors). This paper first explores the impact of different teachers in GNN-to-MLP distillation. Surprisingly, we find that stronger teachers do not always produce stronger students: MLPs distilled from GNN4LP can underperform those distilled from simpler GNNs, while weaker heuristic methods can teach MLPs to near-GNN performance with drastically reduced training costs. Building on these insights, we propose Ensemble Heuristic-Distilled MLPs (EHDM), which eliminates graph dependencies while effectively integrating complementary signals via a gating mechanism. Experiments on ten datasets show an average 7.93% improvement over previous GNN-to-MLP approaches with 1.95-3.32 times less training time, indicating EHDM is an efficient and effective link prediction method.

Paper Structure

This paper contains 27 sections, 2 theorems, 18 equations, 7 figures, 16 tables.

Key Result

Lemma 3.1

Let $F(y\mid \bm{x}_i,\bm{x}_j,s_i,s_j)$ be a teacher model, and let $g(y\mid \bm{x}_i,\bm{x}_j)$ be a student model. Suppose distillation is performed using KL divergence as the loss. Then,

Figures (7)

  • Figure 1: Hits@20 of standard GNNs (SAGE, GAT), a GNN4LP model (NCN), and their student MLPs across five datasets.
  • Figure 2: Exploration of heuristic methods as teachers to distill MLPs. (a) Hits@10 comparison of CN, MLP, and CN-distilled MLP, demonstrating performance improvements through heuristic distillation even when the heuristic method underperforms. (b) Time breakdown of GNN-to-MLP distillation, showing the computational burden of GNN training. (c) Relative time cost for generating guidance with different teachers, illustrating the efficiency advantage of heuristic methods.
  • Figure 3: Subset ratio of positive edges identified by heuristic methods that are also recognized by MLPs (without distillation) and heuristic-distilled MLPs (H-MLP) across three datasets. The ratio indicates the proportion of positive edges identified by heuristic methods that are also recognized by MLPs.
  • Figure 4: Accuracy rankings (Hits@20) of three teachers (CN, CSP, CN+CSP) and corresponding student MLPs across four datasets.
  • Figure 5: Overlap ratio of student MLPs distilled from different heuristic methods.
  • ...and 2 more figures

Theorems & Definitions (5)

  • Definition 1: Teachable Knowledge
  • Lemma 3.1
  • Theorem 3.2: GNN4LP models are not better teachers
  • proof
  • proof