Table of Contents
Fetching ...

Simple Graph Condensation

Zhenbang Xiao, Yu Wang, Shunyu Liu, Huiqiong Wang, Mingli Song, Tongya Zheng

TL;DR

This work tackles the high training costs of large-scale graphs by introducing Simple Graph Condensation (SimGC), a parameter-light framework that condenses a large graph into a small, informative graph for training GNNs. The method uses a pre-trained Simple Graph Convolution (SGC) on the original graph to guide condensation through layerwise representation alignment, output logit alignment, and a kernel-based feature-smoothness regularizer, optimizing the combined loss $L = \alpha L_{rep} + \beta L_{lgt} + \gamma L_{smt}$ with no external parameters. Empirical results on seven datasets show that SimGC achieves competitive or superior accuracy while accelerating condensation by up to 10x, and it demonstrates strong generalization across GNN architectures, as well as utility for neural architecture search and knowledge distillation on condensed graphs. The findings suggest SimGC offers a practical, scalable route to efficient GNN training on massive graphs without sacrificing performance, with potential extensions to heterogeneous graphs and hypergraphs in future work.

Abstract

The burdensome training costs on large-scale graphs have aroused significant interest in graph condensation, which involves tuning Graph Neural Networks (GNNs) on a small condensed graph for use on the large-scale original graph. Existing methods primarily focus on aligning key metrics between the condensed and original graphs, such as gradients, output distribution and trajectories of GNNs, yielding satisfactory performance on downstream tasks. However, these complex metrics necessitate intricate external parameters and can potentially disrupt the optimization process of the condensation graph, making the condensation process highly demanding and unstable. Motivated by the recent success of simplified models across various domains, we propose a simplified approach to metric alignment in graph condensation, aiming to reduce unnecessary complexity inherited from intricate metrics. We introduce the Simple Graph Condensation (SimGC) framework, which aligns the condensed graph with the original graph from the input layer to the prediction layer, guided by a pre-trained Simple Graph Convolution (SGC) model on the original graph. Importantly, SimGC eliminates external parameters and exclusively retains the target condensed graph during the condensation process. This straightforward yet effective strategy achieves a significant speedup of up to 10 times compared to existing graph condensation methods while performing on par with state-of-the-art baselines. Comprehensive experiments conducted on seven benchmark datasets demonstrate the effectiveness of SimGC in prediction accuracy, condensation time, and generalization capability. Our code is available at https://github.com/BangHonor/SimGC.

Simple Graph Condensation

TL;DR

This work tackles the high training costs of large-scale graphs by introducing Simple Graph Condensation (SimGC), a parameter-light framework that condenses a large graph into a small, informative graph for training GNNs. The method uses a pre-trained Simple Graph Convolution (SGC) on the original graph to guide condensation through layerwise representation alignment, output logit alignment, and a kernel-based feature-smoothness regularizer, optimizing the combined loss with no external parameters. Empirical results on seven datasets show that SimGC achieves competitive or superior accuracy while accelerating condensation by up to 10x, and it demonstrates strong generalization across GNN architectures, as well as utility for neural architecture search and knowledge distillation on condensed graphs. The findings suggest SimGC offers a practical, scalable route to efficient GNN training on massive graphs without sacrificing performance, with potential extensions to heterogeneous graphs and hypergraphs in future work.

Abstract

The burdensome training costs on large-scale graphs have aroused significant interest in graph condensation, which involves tuning Graph Neural Networks (GNNs) on a small condensed graph for use on the large-scale original graph. Existing methods primarily focus on aligning key metrics between the condensed and original graphs, such as gradients, output distribution and trajectories of GNNs, yielding satisfactory performance on downstream tasks. However, these complex metrics necessitate intricate external parameters and can potentially disrupt the optimization process of the condensation graph, making the condensation process highly demanding and unstable. Motivated by the recent success of simplified models across various domains, we propose a simplified approach to metric alignment in graph condensation, aiming to reduce unnecessary complexity inherited from intricate metrics. We introduce the Simple Graph Condensation (SimGC) framework, which aligns the condensed graph with the original graph from the input layer to the prediction layer, guided by a pre-trained Simple Graph Convolution (SGC) model on the original graph. Importantly, SimGC eliminates external parameters and exclusively retains the target condensed graph during the condensation process. This straightforward yet effective strategy achieves a significant speedup of up to 10 times compared to existing graph condensation methods while performing on par with state-of-the-art baselines. Comprehensive experiments conducted on seven benchmark datasets demonstrate the effectiveness of SimGC in prediction accuracy, condensation time, and generalization capability. Our code is available at https://github.com/BangHonor/SimGC.
Paper Structure (20 sections, 6 equations, 2 figures, 7 tables, 1 algorithm)

This paper contains 20 sections, 6 equations, 2 figures, 7 tables, 1 algorithm.

Figures (2)

  • Figure 1: (a) Our proposed SimGC framework operates in two stages. First, an SGC model is pre-trained on the original graph. Subsequently, we align the condensed graph with the original graph through representation alignment in the aggregation layers and logit alignment in the output layer, together with a feature smoothness regularizer. (b) Overall performance of all methods on the largest Ogbn-products dataset, where "0.02%", "0.04%", "0.08%" refer to the reduction rates.
  • Figure 2: Comparison of the proposed SimGC with two variants, where Ogbn-arxiv, Reddit, Reddit2 and Ogbn-products are denoted as "OA", "RD", "RD2", "OP".