Table of Contents
Fetching ...

Learning Accurate, Efficient, and Interpretable MLPs on Multiplex Graphs via Node-wise Multi-View Ensemble Distillation

Yunhui Liu, Zhen Tao, Xiang Zhao, Jianhua Zhao, Tao Zheng, Tieke He

TL;DR

This work tackles the latency bottleneck of multiplex graph neural networks (MGNNs) by distilling their knowledge into compact, fast-inference MLPs. It introduces MGFNN, which trains a student MLP using soft labels from a teacher MGNN, and MGFNN+ with node-wise, view-specific ensemble distillation, enhanced by a low-rank reparameterization to learn node-dependent coefficients. The approach achieves about 10% higher accuracy than vanilla MLPs and up to 89× faster inference than MGNNs, with MGFNN+ offering interpretable node-wise coefficients that vary across nodes. The results demonstrate strong gains in accuracy and efficiency across six real-world multiplex datasets, with code released for reproducibility and deployment in latency-sensitive applications.

Abstract

Multiplex graphs, with multiple edge types (graph views) among common nodes, provide richer structural semantics and better modeling capabilities. Multiplex Graph Neural Networks (MGNNs), typically comprising view-specific GNNs and a multi-view integration layer, have achieved advanced performance in various downstream tasks. However, their reliance on neighborhood aggregation poses challenges for deployment in latency-sensitive applications. Motivated by recent GNN-to-MLP knowledge distillation frameworks, we propose Multiplex Graph-Free Neural Networks (MGFNN and MGFNN+) to combine MGNNs' superior performance and MLPs' efficient inference via knowledge distillation. MGFNN directly trains student MLPs with node features as input and soft labels from teacher MGNNs as targets. MGFNN+ further employs a low-rank approximation-based reparameterization to learn node-wise coefficients, enabling adaptive knowledge ensemble from each view-specific GNN. This node-wise multi-view ensemble distillation strategy allows student MLPs to learn more informative multiplex semantic knowledge for different nodes. Experiments show that MGFNNs achieve average accuracy improvements of about 10% over vanilla MLPs and perform comparably or even better to teacher MGNNs (accurate); MGFNNs achieve a 35.40$\times$-89.14$\times$ speedup in inference over MGNNs (efficient); MGFNN+ adaptively assigns different coefficients for multi-view ensemble distillation regarding different nodes (interpretable).

Learning Accurate, Efficient, and Interpretable MLPs on Multiplex Graphs via Node-wise Multi-View Ensemble Distillation

TL;DR

This work tackles the latency bottleneck of multiplex graph neural networks (MGNNs) by distilling their knowledge into compact, fast-inference MLPs. It introduces MGFNN, which trains a student MLP using soft labels from a teacher MGNN, and MGFNN+ with node-wise, view-specific ensemble distillation, enhanced by a low-rank reparameterization to learn node-dependent coefficients. The approach achieves about 10% higher accuracy than vanilla MLPs and up to 89× faster inference than MGNNs, with MGFNN+ offering interpretable node-wise coefficients that vary across nodes. The results demonstrate strong gains in accuracy and efficiency across six real-world multiplex datasets, with code released for reproducibility and deployment in latency-sensitive applications.

Abstract

Multiplex graphs, with multiple edge types (graph views) among common nodes, provide richer structural semantics and better modeling capabilities. Multiplex Graph Neural Networks (MGNNs), typically comprising view-specific GNNs and a multi-view integration layer, have achieved advanced performance in various downstream tasks. However, their reliance on neighborhood aggregation poses challenges for deployment in latency-sensitive applications. Motivated by recent GNN-to-MLP knowledge distillation frameworks, we propose Multiplex Graph-Free Neural Networks (MGFNN and MGFNN+) to combine MGNNs' superior performance and MLPs' efficient inference via knowledge distillation. MGFNN directly trains student MLPs with node features as input and soft labels from teacher MGNNs as targets. MGFNN+ further employs a low-rank approximation-based reparameterization to learn node-wise coefficients, enabling adaptive knowledge ensemble from each view-specific GNN. This node-wise multi-view ensemble distillation strategy allows student MLPs to learn more informative multiplex semantic knowledge for different nodes. Experiments show that MGFNNs achieve average accuracy improvements of about 10% over vanilla MLPs and perform comparably or even better to teacher MGNNs (accurate); MGFNNs achieve a 35.40-89.14 speedup in inference over MGNNs (efficient); MGFNN+ adaptively assigns different coefficients for multi-view ensemble distillation regarding different nodes (interpretable).

Paper Structure

This paper contains 30 sections, 6 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: The number of nodes fetched and inference time of MGNNs are both magnitudes more than MLPs and grow exponentially with the number of layers. (a) The total number of nodes fetched for inference. (b) The total inference time. (Inductive inference for $10$ random nodes on MAG.)
  • Figure 2: Classification accuracy of MGNN, each view-specific GNN, and the ideal ensemble classifier on ACM, IMDB, and MAG.
  • Figure 3: Visualization of learned node-wise ensemble coefficients for 6 randomly selected nodes on ACM, IMDB, and MAG.
  • Figure 4: Transductive Accuracy vs. Teacher MGNN Architectures. MGFNNs can learn from different MGNN teachers to improve over MLPs and achieve comparable results.
  • Figure 5: Accuracy vs. Inductive:Transductive Ratio under the production setting.
  • ...and 1 more figures