Heuristic Methods are Good Teachers to Distill MLPs for Graph Link Prediction
Zongyue Qin, Shichang Zhang, Mingxuan Ju, Tong Zhao, Neil Shah, Yizhou Sun
TL;DR
This work investigates GNN-to-MLP distillation for graph link prediction, revealing that stronger teachers do not always yield better MLP students due to capacity and feature-alignment constraints. It formalizes teachable knowledge and shows that heuristics, despite lower standalone accuracy, can provide complementary, learnable signals that improve MLP performance with far reduced training cost. Building on this, the authors propose Ensemble Heuristic-Distilled MLPs (EHDM), which gates multiple heuristic-derived MLPs using a feature-based gating network to produce a graph-free ensemble. Across ten datasets, EHDM achieves an average 7.93% improvement over prior LLP approaches while reducing training time by 1.95–3.32×, demonstrating practical, scalable link prediction with minimal graph dependency.
Abstract
Link prediction is a crucial graph-learning task with applications including citation prediction and product recommendation. Distilling Graph Neural Networks (GNNs) teachers into Multi-Layer Perceptrons (MLPs) students has emerged as an effective approach to achieve strong performance and reducing computational cost by removing graph dependency. However, existing distillation methods only use standard GNNs and overlook alternative teachers such as specialized model for link prediction (GNN4LP) and heuristic methods (e.g., common neighbors). This paper first explores the impact of different teachers in GNN-to-MLP distillation. Surprisingly, we find that stronger teachers do not always produce stronger students: MLPs distilled from GNN4LP can underperform those distilled from simpler GNNs, while weaker heuristic methods can teach MLPs to near-GNN performance with drastically reduced training costs. Building on these insights, we propose Ensemble Heuristic-Distilled MLPs (EHDM), which eliminates graph dependencies while effectively integrating complementary signals via a gating mechanism. Experiments on ten datasets show an average 7.93% improvement over previous GNN-to-MLP approaches with 1.95-3.32 times less training time, indicating EHDM is an efficient and effective link prediction method.
