Table of Contents
Fetching ...

VQGraph: Rethinking Graph Representation Space for Bridging GNNs and MLPs

Ling Yang, Ye Tian, Minkai Xu, Zhongyi Liu, Shenda Hong, Wei Qu, Wentao Zhang, Bin Cui, Muhan Zhang, Jure Leskovec

TL;DR

This work tackles the limited expressiveness of class-based targets in GNN-to-MLP distillation by introducing VQGraph, which learns a structure-aware graph tokenizer that encodes each node's local substructure as a discrete code from a learned codebook. A structure-aware distillation objective based on soft code assignments transfers both local neighborhood information and global structure-discriminative cues from GNNs to MLPs, without increasing inference time. Across seven datasets and both transductive and inductive settings, VQGraph achieves state-of-the-art distillation performance, improves over GNNs by an average of $3.90\%$, and delivers up to $828\times$ faster inference than GNNs, demonstrating practical impact for scalable graph learning. The approach also extends to heterophilic graphs and offers strong robustness and interpretability through codebook analyses and subgraph-level correspondence. Overall, VQGraph provides a scalable, structure-rich representation space that substantially enhances GNN-to-MLP distillation and practical deployment on large graphs.

Abstract

GNN-to-MLP distillation aims to utilize knowledge distillation (KD) to learn computationally-efficient multi-layer perceptron (student MLP) on graph data by mimicking the output representations of teacher GNN. Existing methods mainly make the MLP to mimic the GNN predictions over a few class labels. However, the class space may not be expressive enough for covering numerous diverse local graph structures, thus limiting the performance of knowledge transfer from GNN to MLP. To address this issue, we propose to learn a new powerful graph representation space by directly labeling nodes' diverse local structures for GNN-to-MLP distillation. Specifically, we propose a variant of VQ-VAE to learn a structure-aware tokenizer on graph data that can encode each node's local substructure as a discrete code. The discrete codes constitute a codebook as a new graph representation space that is able to identify different local graph structures of nodes with the corresponding code indices. Then, based on the learned codebook, we propose a new distillation target, namely soft code assignments, to directly transfer the structural knowledge of each node from GNN to MLP. The resulting framework VQGraph achieves new state-of-the-art performance on GNN-to-MLP distillation in both transductive and inductive settings across seven graph datasets. We show that VQGraph with better performance infers faster than GNNs by 828x, and also achieves accuracy improvement over GNNs and stand-alone MLPs by 3.90% and 28.05% on average, respectively. Code: https://github.com/YangLing0818/VQGraph.

VQGraph: Rethinking Graph Representation Space for Bridging GNNs and MLPs

TL;DR

This work tackles the limited expressiveness of class-based targets in GNN-to-MLP distillation by introducing VQGraph, which learns a structure-aware graph tokenizer that encodes each node's local substructure as a discrete code from a learned codebook. A structure-aware distillation objective based on soft code assignments transfers both local neighborhood information and global structure-discriminative cues from GNNs to MLPs, without increasing inference time. Across seven datasets and both transductive and inductive settings, VQGraph achieves state-of-the-art distillation performance, improves over GNNs by an average of , and delivers up to faster inference than GNNs, demonstrating practical impact for scalable graph learning. The approach also extends to heterophilic graphs and offers strong robustness and interpretability through codebook analyses and subgraph-level correspondence. Overall, VQGraph provides a scalable, structure-rich representation space that substantially enhances GNN-to-MLP distillation and practical deployment on large graphs.

Abstract

GNN-to-MLP distillation aims to utilize knowledge distillation (KD) to learn computationally-efficient multi-layer perceptron (student MLP) on graph data by mimicking the output representations of teacher GNN. Existing methods mainly make the MLP to mimic the GNN predictions over a few class labels. However, the class space may not be expressive enough for covering numerous diverse local graph structures, thus limiting the performance of knowledge transfer from GNN to MLP. To address this issue, we propose to learn a new powerful graph representation space by directly labeling nodes' diverse local structures for GNN-to-MLP distillation. Specifically, we propose a variant of VQ-VAE to learn a structure-aware tokenizer on graph data that can encode each node's local substructure as a discrete code. The discrete codes constitute a codebook as a new graph representation space that is able to identify different local graph structures of nodes with the corresponding code indices. Then, based on the learned codebook, we propose a new distillation target, namely soft code assignments, to directly transfer the structural knowledge of each node from GNN to MLP. The resulting framework VQGraph achieves new state-of-the-art performance on GNN-to-MLP distillation in both transductive and inductive settings across seven graph datasets. We show that VQGraph with better performance infers faster than GNNs by 828x, and also achieves accuracy improvement over GNNs and stand-alone MLPs by 3.90% and 28.05% on average, respectively. Code: https://github.com/YangLing0818/VQGraph.
Paper Structure (41 sections, 7 equations, 8 figures, 13 tables)

This paper contains 41 sections, 7 equations, 8 figures, 13 tables.

Figures (8)

  • Figure 1: The t-SNE visualization of the learned graph representation space in two kinds of teacher GNNs: (a) previous SOTA "class-based" NOSMOG tian2023learning and (b) our "structure-based" VQGraph. "class-based" denotes learning with class labels, and "structure-based" denotes learning with our local structure reconstruction. Our learned space is more compact. We here provide both class labels and our structure labels along with illustrative substructures for demonstration.
  • Figure 2: The schematic diagram of VQGraph, including graph tokenizer training (Top) and structure-aware code-based GNN-to-MLP Distillation (Bottom).
  • Figure 3: Accuracy vs. Inference Time.
  • Figure 4: t-SNE visualization of learned node representations, colors denotes different classes.
  • Figure 5: The query node and 4 closest nodes in distilled MLP representation space with corresponding subgraphs.
  • ...and 3 more figures