EXGC: Bridging Efficiency and Explainability in Graph Condensation
Junfeng Fang, Xinglin Li, Yongduo Sui, Yuan Gao, Guibin Zhang, Kun Wang, Xiang Wang, Xiangnan He
TL;DR
The paper tackles inefficiencies in graph condensation for large-scale graphs by identifying two bottlenecks: extensive concurrent parameter updates and redundancy in the synthetic graph. It proposes MGCond, which uses Mean-Field variational approximation to accelerate the EM-like E-step, and EXGC, which employs Gradient Information Bottleneck (GDIB) with explainers (e.g., SA, GSAT, GNNExplainer) to prune training redundancy and provide explainability. Through extensive experiments on six node-classification and three graph-classification datasets, EXGC achieves substantial speedups (often by factors of tens to hundreds) while maintaining or improving accuracy, and it generalizes well to DosGCond and across multiple backbones. The approach advances practical graph condensation by combining efficiency gains with interpretable training dynamics, offering guidelines for node selection and demonstrating cross-architecture transferability. Future work includes pruning redundancy at initialization and applying the framework to additional graph-centric tasks.
Abstract
Graph representation learning on vast datasets, like web data, has made significant strides. However, the associated computational and storage overheads raise concerns. In sight of this, Graph condensation (GCond) has been introduced to distill these large real datasets into a more concise yet information-rich synthetic graph. Despite acceleration efforts, existing GCond methods mainly grapple with efficiency, especially on expansive web data graphs. Hence, in this work, we pinpoint two major inefficiencies of current paradigms: (1) the concurrent updating of a vast parameter set, and (2) pronounced parameter redundancy. To counteract these two limitations correspondingly, we first (1) employ the Mean-Field variational approximation for convergence acceleration, and then (2) propose the objective of Gradient Information Bottleneck (GDIB) to prune redundancy. By incorporating the leading explanation techniques (e.g., GNNExplainer and GSAT) to instantiate the GDIB, our EXGC, the Efficient and eXplainable Graph Condensation method is proposed, which can markedly boost efficiency and inject explainability. Our extensive evaluations across eight datasets underscore EXGC's superiority and relevance. Code is available at https://github.com/MangoKiller/EXGC.
