MuGSI: Distilling GNNs with Multi-Granularity Structural Information for Graph Classification
Tianjun Yao, Jiaqi Sun, Defu Cao, Kun Zhang, Guangyi Chen
TL;DR
MuGSI addresses the gap in applying GNN-to-MLP knowledge distillation to graph classification by introducing multi-granularity distillation losses that align teacher and student representations across graph, subgraph, and node levels, complemented by node-feature augmentation via LaPE and a GA-MLP student. The framework defines graph-level, inter-cluster, and path-consistency losses, enabling dense supervision and effective transfer of structural information while keeping inference fast. Empirical results across multiple datasets show MuGSI improves over strong baselines, with GA-MLP students achieving competitive or superior performance to teachers and exhibiting robustness under dynamic graph changes. This approach provides a practical, model-agnostic KD pathway for efficient graph classification, suitable for resource-constrained deployment and real-time applications.
Abstract
Recent works have introduced GNN-to-MLP knowledge distillation (KD) frameworks to combine both GNN's superior performance and MLP's fast inference speed. However, existing KD frameworks are primarily designed for node classification within single graphs, leaving their applicability to graph classification largely unexplored. Two main challenges arise when extending KD for node classification to graph classification: (1) The inherent sparsity of learning signals due to soft labels being generated at the graph level; (2) The limited expressiveness of student MLPs, especially in datasets with limited input feature spaces. To overcome these challenges, we introduce MuGSI, a novel KD framework that employs Multi-granularity Structural Information for graph classification. Specifically, we propose multi-granularity distillation loss in MuGSI to tackle the first challenge. This loss function is composed of three distinct components: graph-level distillation, subgraph-level distillation, and node-level distillation. Each component targets a specific granularity of the graph structure, ensuring a comprehensive transfer of structural knowledge from the teacher model to the student model. To tackle the second challenge, MuGSI proposes to incorporate a node feature augmentation component, thereby enhancing the expressiveness of the student MLPs and making them more capable learners. We perform extensive experiments across a variety of datasets and different teacher/student model architectures. The experiment results demonstrate the effectiveness, efficiency, and robustness of MuGSI. Codes are publicly available at: \textbf{\url{https://github.com/tianyao-aka/MuGSI}.}
