Adversarial Curriculum Graph-Free Knowledge Distillation for Graph Neural Networks
Yuang Jia, Xiaojuan Shan, Jun Xia, Guancheng Wan, Yuchen Zhang, Wenke Huang, Mang Ye, Stan Z. Li
TL;DR
This paper tackles data-free knowledge distillation for graph neural networks by proposing ACGKD, which uses Binary Concrete distributions to model pseudo-graph structures, thereby enabling differentiable gradient flow and substantially reducing spatial complexity with a learnable parameter ξ. It further augments distillation with curriculum learning to progress from easy to harder pseudo-graphs and a dynamic adversarial temperature to tighten knowledge transfer, while reusing the teacher’s classifier through a GAT-based projection to resolve dimensional mismatches. Across six benchmarks, including bioinformatics and social graphs, ACGKD achieves state-of-the-art performance without real data, and significantly cuts pseudo-graph generation time compared with prior graph-free KD methods. The approach yields strong generalization across teacher–student pairs and provides practical benefits for privacy-preserving model compression and knowledge transfer in graph domains.
Abstract
Data-free Knowledge Distillation (DFKD) is a method that constructs pseudo-samples using a generator without real data, and transfers knowledge from a teacher model to a student by enforcing the student to overcome dimensional differences and learn to mimic the teacher's outputs on these pseudo-samples. In recent years, various studies in the vision domain have made notable advancements in this area. However, the varying topological structures and non-grid nature of graph data render the methods from the vision domain ineffective. Building upon prior research into differentiable methods for graph neural networks, we propose a fast and high-quality data-free knowledge distillation approach in this paper. Without compromising distillation quality, the proposed graph-free KD method (ACGKD) significantly reduces the spatial complexity of pseudo-graphs by leveraging the Binary Concrete distribution to model the graph structure and introducing a spatial complexity tuning parameter. This approach enables efficient gradient computation for the graph structure, thereby accelerating the overall distillation process. Additionally, ACGKD eliminates the dimensional ambiguity between the student and teacher models by increasing the student's dimensions and reusing the teacher's classifier. Moreover, it equips graph knowledge distillation with a CL-based strategy to ensure the student learns graph structures progressively. Extensive experiments demonstrate that ACGKD achieves state-of-the-art performance in distilling knowledge from GNNs without training data.
