Table of Contents
Fetching ...

Diffusion-Assisted Distillation for Self-Supervised Graph Representation Learning with MLPs

Seong Jin Ahn, Myoung-Ho Kim

TL;DR

DAD-SGM introduces a diffusion-based teacher assistant to bridge the capacity gap when distilling self-supervised GNN knowledge into MLPs. The method trains an MLP-denoising diffusion model to predict noise from the teacher and then distills the teacher into a student MLP by aligning diffusion-noise predictions, enabling scalable, robust self-supervised graph representations. Empirically, it yields up to 15% node-classification and 19% link-prediction gains over prior GNN-to-MLP distillation methods while maintaining fast inference on large graphs, and it shows improved robustness to noise and adversarial perturbations. The work suggests practical impact for large-scale graph analysis with lightweight models and motivates extensions to heterogeneous graphs via conditional diffusion modeling.

Abstract

For large-scale applications, there is growing interest in replacing Graph Neural Networks (GNNs) with lightweight Multi-Layer Perceptrons (MLPs) via knowledge distillation. However, distilling GNNs for self-supervised graph representation learning into MLPs is more challenging. This is because the performance of self-supervised learning is more related to the model's inductive bias than supervised learning. This motivates us to design a new distillation method to bridge a huge capacity gap between GNNs and MLPs in self-supervised graph representation learning. In this paper, we propose \textbf{D}iffusion-\textbf{A}ssisted \textbf{D}istillation for \textbf{S}elf-supervised \textbf{G}raph representation learning with \textbf{M}LPs (DAD-SGM). The proposed method employs a denoising diffusion model as a teacher assistant to better distill the knowledge from the teacher GNN into the student MLP. This approach enhances the generalizability and robustness of MLPs in self-supervised graph representation learning. Extensive experiments demonstrate that DAD-SGM effectively distills the knowledge of self-supervised GNNs compared to state-of-the-art GNN-to-MLP distillation methods. Our implementation is available at https://github.com/SeongJinAhn/DAD-SGM.

Diffusion-Assisted Distillation for Self-Supervised Graph Representation Learning with MLPs

TL;DR

DAD-SGM introduces a diffusion-based teacher assistant to bridge the capacity gap when distilling self-supervised GNN knowledge into MLPs. The method trains an MLP-denoising diffusion model to predict noise from the teacher and then distills the teacher into a student MLP by aligning diffusion-noise predictions, enabling scalable, robust self-supervised graph representations. Empirically, it yields up to 15% node-classification and 19% link-prediction gains over prior GNN-to-MLP distillation methods while maintaining fast inference on large graphs, and it shows improved robustness to noise and adversarial perturbations. The work suggests practical impact for large-scale graph analysis with lightweight models and motivates extensions to heterogeneous graphs via conditional diffusion modeling.

Abstract

For large-scale applications, there is growing interest in replacing Graph Neural Networks (GNNs) with lightweight Multi-Layer Perceptrons (MLPs) via knowledge distillation. However, distilling GNNs for self-supervised graph representation learning into MLPs is more challenging. This is because the performance of self-supervised learning is more related to the model's inductive bias than supervised learning. This motivates us to design a new distillation method to bridge a huge capacity gap between GNNs and MLPs in self-supervised graph representation learning. In this paper, we propose \textbf{D}iffusion-\textbf{A}ssisted \textbf{D}istillation for \textbf{S}elf-supervised \textbf{G}raph representation learning with \textbf{M}LPs (DAD-SGM). The proposed method employs a denoising diffusion model as a teacher assistant to better distill the knowledge from the teacher GNN into the student MLP. This approach enhances the generalizability and robustness of MLPs in self-supervised graph representation learning. Extensive experiments demonstrate that DAD-SGM effectively distills the knowledge of self-supervised GNNs compared to state-of-the-art GNN-to-MLP distillation methods. Our implementation is available at https://github.com/SeongJinAhn/DAD-SGM.

Paper Structure

This paper contains 27 sections, 7 equations, 10 figures, 9 tables, 2 algorithms.

Figures (10)

  • Figure 1: (a) The node classification and link prediction performance of two GNN-to-MLP distillation methods (FF-G2M and LLP) on the Citeseer dataset, and (b) The node classification performance of a 1-layer GCN and 3-layer MLP in supervised and unsupervised settings.
  • Figure 2: The procedure of training our assistant denoising diffusion model in the first stage.
  • Figure 3: An overview of the proposed DAD-SGM to train its student MLP diffusion model.
  • Figure 4: Inference time (ms) and accuracy (F1 score) on the Ogbn-products dataset.
  • Figure 5: Comparison of our TA model with other candidates. Blue bars represent the node classification performance of TAs, and orange bars represent that of their distilled students. The red dotted line shows the performance of a baseline student model directly distilled from the teacher (without TA).
  • ...and 5 more figures