Learning Accurate, Efficient, and Interpretable MLPs on Multiplex Graphs via Node-wise Multi-View Ensemble Distillation
Yunhui Liu, Zhen Tao, Xiang Zhao, Jianhua Zhao, Tao Zheng, Tieke He
TL;DR
This work tackles the latency bottleneck of multiplex graph neural networks (MGNNs) by distilling their knowledge into compact, fast-inference MLPs. It introduces MGFNN, which trains a student MLP using soft labels from a teacher MGNN, and MGFNN+ with node-wise, view-specific ensemble distillation, enhanced by a low-rank reparameterization to learn node-dependent coefficients. The approach achieves about 10% higher accuracy than vanilla MLPs and up to 89× faster inference than MGNNs, with MGFNN+ offering interpretable node-wise coefficients that vary across nodes. The results demonstrate strong gains in accuracy and efficiency across six real-world multiplex datasets, with code released for reproducibility and deployment in latency-sensitive applications.
Abstract
Multiplex graphs, with multiple edge types (graph views) among common nodes, provide richer structural semantics and better modeling capabilities. Multiplex Graph Neural Networks (MGNNs), typically comprising view-specific GNNs and a multi-view integration layer, have achieved advanced performance in various downstream tasks. However, their reliance on neighborhood aggregation poses challenges for deployment in latency-sensitive applications. Motivated by recent GNN-to-MLP knowledge distillation frameworks, we propose Multiplex Graph-Free Neural Networks (MGFNN and MGFNN+) to combine MGNNs' superior performance and MLPs' efficient inference via knowledge distillation. MGFNN directly trains student MLPs with node features as input and soft labels from teacher MGNNs as targets. MGFNN+ further employs a low-rank approximation-based reparameterization to learn node-wise coefficients, enabling adaptive knowledge ensemble from each view-specific GNN. This node-wise multi-view ensemble distillation strategy allows student MLPs to learn more informative multiplex semantic knowledge for different nodes. Experiments show that MGFNNs achieve average accuracy improvements of about 10% over vanilla MLPs and perform comparably or even better to teacher MGNNs (accurate); MGFNNs achieve a 35.40$\times$-89.14$\times$ speedup in inference over MGNNs (efficient); MGFNN+ adaptively assigns different coefficients for multi-view ensemble distillation regarding different nodes (interpretable).
