Table of Contents
Fetching ...

MuGSI: Distilling GNNs with Multi-Granularity Structural Information for Graph Classification

Tianjun Yao, Jiaqi Sun, Defu Cao, Kun Zhang, Guangyi Chen

TL;DR

MuGSI addresses the gap in applying GNN-to-MLP knowledge distillation to graph classification by introducing multi-granularity distillation losses that align teacher and student representations across graph, subgraph, and node levels, complemented by node-feature augmentation via LaPE and a GA-MLP student. The framework defines graph-level, inter-cluster, and path-consistency losses, enabling dense supervision and effective transfer of structural information while keeping inference fast. Empirical results across multiple datasets show MuGSI improves over strong baselines, with GA-MLP students achieving competitive or superior performance to teachers and exhibiting robustness under dynamic graph changes. This approach provides a practical, model-agnostic KD pathway for efficient graph classification, suitable for resource-constrained deployment and real-time applications.

Abstract

Recent works have introduced GNN-to-MLP knowledge distillation (KD) frameworks to combine both GNN's superior performance and MLP's fast inference speed. However, existing KD frameworks are primarily designed for node classification within single graphs, leaving their applicability to graph classification largely unexplored. Two main challenges arise when extending KD for node classification to graph classification: (1) The inherent sparsity of learning signals due to soft labels being generated at the graph level; (2) The limited expressiveness of student MLPs, especially in datasets with limited input feature spaces. To overcome these challenges, we introduce MuGSI, a novel KD framework that employs Multi-granularity Structural Information for graph classification. Specifically, we propose multi-granularity distillation loss in MuGSI to tackle the first challenge. This loss function is composed of three distinct components: graph-level distillation, subgraph-level distillation, and node-level distillation. Each component targets a specific granularity of the graph structure, ensuring a comprehensive transfer of structural knowledge from the teacher model to the student model. To tackle the second challenge, MuGSI proposes to incorporate a node feature augmentation component, thereby enhancing the expressiveness of the student MLPs and making them more capable learners. We perform extensive experiments across a variety of datasets and different teacher/student model architectures. The experiment results demonstrate the effectiveness, efficiency, and robustness of MuGSI. Codes are publicly available at: \textbf{\url{https://github.com/tianyao-aka/MuGSI}.}

MuGSI: Distilling GNNs with Multi-Granularity Structural Information for Graph Classification

TL;DR

MuGSI addresses the gap in applying GNN-to-MLP knowledge distillation to graph classification by introducing multi-granularity distillation losses that align teacher and student representations across graph, subgraph, and node levels, complemented by node-feature augmentation via LaPE and a GA-MLP student. The framework defines graph-level, inter-cluster, and path-consistency losses, enabling dense supervision and effective transfer of structural information while keeping inference fast. Empirical results across multiple datasets show MuGSI improves over strong baselines, with GA-MLP students achieving competitive or superior performance to teachers and exhibiting robustness under dynamic graph changes. This approach provides a practical, model-agnostic KD pathway for efficient graph classification, suitable for resource-constrained deployment and real-time applications.

Abstract

Recent works have introduced GNN-to-MLP knowledge distillation (KD) frameworks to combine both GNN's superior performance and MLP's fast inference speed. However, existing KD frameworks are primarily designed for node classification within single graphs, leaving their applicability to graph classification largely unexplored. Two main challenges arise when extending KD for node classification to graph classification: (1) The inherent sparsity of learning signals due to soft labels being generated at the graph level; (2) The limited expressiveness of student MLPs, especially in datasets with limited input feature spaces. To overcome these challenges, we introduce MuGSI, a novel KD framework that employs Multi-granularity Structural Information for graph classification. Specifically, we propose multi-granularity distillation loss in MuGSI to tackle the first challenge. This loss function is composed of three distinct components: graph-level distillation, subgraph-level distillation, and node-level distillation. Each component targets a specific granularity of the graph structure, ensuring a comprehensive transfer of structural knowledge from the teacher model to the student model. To tackle the second challenge, MuGSI proposes to incorporate a node feature augmentation component, thereby enhancing the expressiveness of the student MLPs and making them more capable learners. We perform extensive experiments across a variety of datasets and different teacher/student model architectures. The experiment results demonstrate the effectiveness, efficiency, and robustness of MuGSI. Codes are publicly available at: \textbf{\url{https://github.com/tianyao-aka/MuGSI}.}
Paper Structure (28 sections, 13 equations, 4 figures, 8 tables, 1 algorithm)

This paper contains 28 sections, 13 equations, 4 figures, 8 tables, 1 algorithm.

Figures (4)

  • Figure 1: The figure illustrates the KD process with multi-granularity distillation loss. First a teacher GNN model is pre-trained, then an MLP-type student model is trained using the distilled multi-granularity structural knowledge from the teacher model: (a) whole-graph distillation loss $\mathcal{L}_{\mathcal{G}}$; (b) inter-cluster distillation loss $\mathcal{L}_{\mathcal{C}}$; (c) path-consistency loss $\mathcal{L}_{\mathcal{P}}$. Note that the soft logits distillation loss $\mathcal{L}_{SL}$ and the ground-truth cross-entropy loss $\mathcal{L}_{GT}$ are not shown in the figure.
  • Figure 2: Average prediction error and entropy resulted by GIN and MuGSI$_{GA-MLP^*}$ when sequentially inserting 10 nodes back to the graphs. As demonstrated, MuGSI$_{GA-MLP^*}$ is more robust and less susceptible to topological changes.
  • Figure 3: Average inference time from GIN and MuGSI$_{GA-MLP^*}$ when sequentially inserting 10 nodes back to the graphs.
  • Figure 4: Mean best accuracy for different hyper-parameter combinations for MuGSI$_{MLP^*}$ and MuGSI$_{GA-MLP^*}$.