Table of Contents
Fetching ...

EXGC: Bridging Efficiency and Explainability in Graph Condensation

Junfeng Fang, Xinglin Li, Yongduo Sui, Yuan Gao, Guibin Zhang, Kun Wang, Xiang Wang, Xiangnan He

TL;DR

The paper tackles inefficiencies in graph condensation for large-scale graphs by identifying two bottlenecks: extensive concurrent parameter updates and redundancy in the synthetic graph. It proposes MGCond, which uses Mean-Field variational approximation to accelerate the EM-like E-step, and EXGC, which employs Gradient Information Bottleneck (GDIB) with explainers (e.g., SA, GSAT, GNNExplainer) to prune training redundancy and provide explainability. Through extensive experiments on six node-classification and three graph-classification datasets, EXGC achieves substantial speedups (often by factors of tens to hundreds) while maintaining or improving accuracy, and it generalizes well to DosGCond and across multiple backbones. The approach advances practical graph condensation by combining efficiency gains with interpretable training dynamics, offering guidelines for node selection and demonstrating cross-architecture transferability. Future work includes pruning redundancy at initialization and applying the framework to additional graph-centric tasks.

Abstract

Graph representation learning on vast datasets, like web data, has made significant strides. However, the associated computational and storage overheads raise concerns. In sight of this, Graph condensation (GCond) has been introduced to distill these large real datasets into a more concise yet information-rich synthetic graph. Despite acceleration efforts, existing GCond methods mainly grapple with efficiency, especially on expansive web data graphs. Hence, in this work, we pinpoint two major inefficiencies of current paradigms: (1) the concurrent updating of a vast parameter set, and (2) pronounced parameter redundancy. To counteract these two limitations correspondingly, we first (1) employ the Mean-Field variational approximation for convergence acceleration, and then (2) propose the objective of Gradient Information Bottleneck (GDIB) to prune redundancy. By incorporating the leading explanation techniques (e.g., GNNExplainer and GSAT) to instantiate the GDIB, our EXGC, the Efficient and eXplainable Graph Condensation method is proposed, which can markedly boost efficiency and inject explainability. Our extensive evaluations across eight datasets underscore EXGC's superiority and relevance. Code is available at https://github.com/MangoKiller/EXGC.

EXGC: Bridging Efficiency and Explainability in Graph Condensation

TL;DR

The paper tackles inefficiencies in graph condensation for large-scale graphs by identifying two bottlenecks: extensive concurrent parameter updates and redundancy in the synthetic graph. It proposes MGCond, which uses Mean-Field variational approximation to accelerate the EM-like E-step, and EXGC, which employs Gradient Information Bottleneck (GDIB) with explainers (e.g., SA, GSAT, GNNExplainer) to prune training redundancy and provide explainability. Through extensive experiments on six node-classification and three graph-classification datasets, EXGC achieves substantial speedups (often by factors of tens to hundreds) while maintaining or improving accuracy, and it generalizes well to DosGCond and across multiple backbones. The approach advances practical graph condensation by combining efficiency gains with interpretable training dynamics, offering guidelines for node selection and demonstrating cross-architecture transferability. Future work includes pruning redundancy at initialization and applying the framework to additional graph-centric tasks.

Abstract

Graph representation learning on vast datasets, like web data, has made significant strides. However, the associated computational and storage overheads raise concerns. In sight of this, Graph condensation (GCond) has been introduced to distill these large real datasets into a more concise yet information-rich synthetic graph. Despite acceleration efforts, existing GCond methods mainly grapple with efficiency, especially on expansive web data graphs. Hence, in this work, we pinpoint two major inefficiencies of current paradigms: (1) the concurrent updating of a vast parameter set, and (2) pronounced parameter redundancy. To counteract these two limitations correspondingly, we first (1) employ the Mean-Field variational approximation for convergence acceleration, and then (2) propose the objective of Gradient Information Bottleneck (GDIB) to prune redundancy. By incorporating the leading explanation techniques (e.g., GNNExplainer and GSAT) to instantiate the GDIB, our EXGC, the Efficient and eXplainable Graph Condensation method is proposed, which can markedly boost efficiency and inject explainability. Our extensive evaluations across eight datasets underscore EXGC's superiority and relevance. Code is available at https://github.com/MangoKiller/EXGC.
Paper Structure (20 sections, 40 equations, 4 figures, 5 tables)

This paper contains 20 sections, 40 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The compression capability and limitations of current GCond. (a) GCond adeptly compresses the dataset to just 0.1% of its initial size without compromising the accuracy benchmarks. (b) Contrary to traditional graph learning, GCond's parameters scale with node count. (c) To avoid insufficient information capacity, GCond typically introduces node redundancy.
  • Figure 2: The paradigm of current GCond methods from the perspective of the EM schema, and the E-step of our proposed MGcond and EXGC.
  • Figure 3: The training process of EXGC and GCond across Cora, Citeseer, Ogbn-Arxiv and Ogbn-Product four benchmarks. We can observe that EXGC achieves optimal performance ahead by 507, 1097, 832, and 366 epochs respectively, at which points training can be terminated.
  • Figure 4: Performance comparison across six benchmarks under various explanation methods.