AdaGMLP: AdaBoosting GNN-to-MLP Knowledge Distillation

Weigang Lu; Ziyu Guan; Wei Zhao; Yaming Yang

AdaGMLP: AdaBoosting GNN-to-MLP Knowledge Distillation

Weigang Lu, Ziyu Guan, Wei Zhao, Yaming Yang

TL;DR

This work tackles the latency-constraint challenge of deploying graph models by transferring GNN knowledge to an efficient GNN-to-MLP KD framework. AdaGMLP uses an AdaBoost-style ensemble of MLP students with Random Classification and a Node Alignment module to combat insufficient training data and incomplete test data, respectively. Empirical results across seven benchmarks show AdaGMLP outperforms existing G2M KD methods and remains competitive with GNN teachers, especially in edge scenarios with limited labels or missing features. The approach offers a practical pathway to robust, scalable graph inference on latency-sensitive devices, aided by open-source code for reproducibility.

Abstract

Graph Neural Networks (GNNs) have revolutionized graph-based machine learning, but their heavy computational demands pose challenges for latency-sensitive edge devices in practical industrial applications. In response, a new wave of methods, collectively known as GNN-to-MLP Knowledge Distillation, has emerged. They aim to transfer GNN-learned knowledge to a more efficient MLP student, which offers faster, resource-efficient inference while maintaining competitive performance compared to GNNs. However, these methods face significant challenges in situations with insufficient training data and incomplete test data, limiting their applicability in real-world applications. To address these challenges, we propose AdaGMLP, an AdaBoosting GNN-to-MLP Knowledge Distillation framework. It leverages an ensemble of diverse MLP students trained on different subsets of labeled nodes, addressing the issue of insufficient training data. Additionally, it incorporates a Node Alignment technique for robust predictions on test data with missing or incomplete features. Our experiments on seven benchmark datasets with different settings demonstrate that AdaGMLP outperforms existing G2M methods, making it suitable for a wide range of latency-sensitive real-world applications. We have submitted our code to the GitHub repository (https://github.com/WeigangLu/AdaGMLP-KDD24).

AdaGMLP: AdaBoosting GNN-to-MLP Knowledge Distillation

TL;DR

Abstract

Paper Structure (27 sections, 16 equations, 7 figures, 6 tables, 1 algorithm)

This paper contains 27 sections, 16 equations, 7 figures, 6 tables, 1 algorithm.

Introduction
Related Works
Preliminaries
Motivation
Challenges in Existing G2M KD
Towards Addressing these Challenges
Methodology
Random Classification
Node Alignment
AdaBoosting Knowledge Distillation
Training and Inference
Complexity
Experiments
Experiment Setting
Classification Performance Comparison (Q1)
...and 12 more sections

Figures (7)

Figure 1: [Challenge 1] Insufficient Training Data. The single-MLP G2M methods with a single MLP student exhibit higher sensitivity to changes in label rates compared to vanilla GNNs. Notably, as the label rate decreases, there is a discernible trend of increasing box heights and the distance between outliers and box boundaries.
Figure 2: [Challenge 2] Incomplete Test Data. Traditional G2M methods suffer from performance consistent drops when more features are missing. Our AdaGMLP consistently maintains a high accuracy level, outperforming other G2M methods as the fraction of missing features increases.
Figure 3: Illustration of AdaGMLP. In (a), for each MLP, we compute the KL loss using node weights, which are determined by the difference between MLP and corresponding GNN outputs (Knowledge Distillation). Additionally, we calculate the CE loss by comparing the sampled labeled nodes with their respective ground-truth labels (Random Classification). In (b), we begin by obtain incomplete nodes with randomly masking the features of the selected nodes and inputting them into the MLP. Subsequently, we employ Mean Squared Error (MSE) loss to align their hidden representations and outputs (Node Alignment).
Figure 4: Hyper-parameter Analysis on $\lambda$, $\lambda_{\mathrm{NA}}$, and $\beta$.
Figure 5: Ensemble Size ($K$) Analysis.
...and 2 more figures

AdaGMLP: AdaBoosting GNN-to-MLP Knowledge Distillation

TL;DR

Abstract

AdaGMLP: AdaBoosting GNN-to-MLP Knowledge Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)