Table of Contents
Fetching ...

GCondenser: Benchmarking Graph Condensation

Yilun Liu, Ruihong Qiu, Zi Huang

TL;DR

GCondenser introduces the first large-scale benchmark for graph condensation, standardising the condensation-validation-evaluation workflow to enable fair, extensible comparisons across GC methods. It surveys gradient, distribution, trajectory, and eigenbasis matching paradigms, and provides a unified framework for initialization, learning, and validation using diverse validators. Through seven datasets and multiple backbones, the study demonstrates that proper hyperparameter tuning and validation can significantly boost condensed-graph quality, with trajectory and distribution approaches performing strongly on larger graphs, while structure-free methods remain competitive in many settings. The benchmark's open-source release and comprehensive evaluation enable practical assessment of GC methods for node classification and related tasks, supporting cross-architecture transferability and continual graph learning, and guiding future developments in GC research.

Abstract

Large-scale graphs are valuable for graph representation learning, yet the abundant data in these graphs hinders the efficiency of the training process. Graph condensation (GC) alleviates this issue by compressing the large graph into a significantly smaller one that still supports effective model training. Although recent research has introduced various approaches to improve the effectiveness of the condensed graph, comprehensive and practical evaluations across different GC methods are neglected. This paper proposes the first large-scale graph condensation benchmark, GCondenser, to holistically evaluate and compare mainstream GC methods. GCondenser includes a standardised GC paradigm, consisting of condensation, validation, and evaluation procedures, as well as enabling extensions to new GC methods and datasets. With GCondenser, a comprehensive performance study is conducted, presenting the effectiveness of existing methods. GCondenser is open-sourced and available at https://github.com/superallen13/GCondenser.

GCondenser: Benchmarking Graph Condensation

TL;DR

GCondenser introduces the first large-scale benchmark for graph condensation, standardising the condensation-validation-evaluation workflow to enable fair, extensible comparisons across GC methods. It surveys gradient, distribution, trajectory, and eigenbasis matching paradigms, and provides a unified framework for initialization, learning, and validation using diverse validators. Through seven datasets and multiple backbones, the study demonstrates that proper hyperparameter tuning and validation can significantly boost condensed-graph quality, with trajectory and distribution approaches performing strongly on larger graphs, while structure-free methods remain competitive in many settings. The benchmark's open-source release and comprehensive evaluation enable practical assessment of GC methods for node classification and related tasks, supporting cross-architecture transferability and continual graph learning, and guiding future developments in GC research.

Abstract

Large-scale graphs are valuable for graph representation learning, yet the abundant data in these graphs hinders the efficiency of the training process. Graph condensation (GC) alleviates this issue by compressing the large graph into a significantly smaller one that still supports effective model training. Although recent research has introduced various approaches to improve the effectiveness of the condensed graph, comprehensive and practical evaluations across different GC methods are neglected. This paper proposes the first large-scale graph condensation benchmark, GCondenser, to holistically evaluate and compare mainstream GC methods. GCondenser includes a standardised GC paradigm, consisting of condensation, validation, and evaluation procedures, as well as enabling extensions to new GC methods and datasets. With GCondenser, a comprehensive performance study is conducted, presenting the effectiveness of existing methods. GCondenser is open-sourced and available at https://github.com/superallen13/GCondenser.
Paper Structure (27 sections, 9 equations, 19 figures, 8 tables)

This paper contains 27 sections, 9 equations, 19 figures, 8 tables.

Figures (19)

  • Figure 1: The standardised graph condensation paradigm in GCondenser consists of condensation, validation, and evaluation modules. $\mathcal{G}'_{0}$ is first initialised from the original graph $\mathcal{G}$ and matched with the original graph across $K$ initialised model parameter spaces (i.e., from $\boldsymbol{\theta}_0$ to $\boldsymbol{\theta}_{K-1}$). Each matching block represents either multi-step jinGraphCondensationGraph2022a or one-step jinCondensingGraphsOneStep2022a matching.
  • Figure 2: Test accuracy against condensation time for different GC methods on a 90-node condensed graph from the Arxiv dataset, with backbone models GCN and SGC. More results in Appendix \ref{['sec:efficiency-app']}.
  • Figure 3: Transferability of condensed graphs for Cora with budget 35. More results in Appendix \ref{['sec:exp-transferability-app']}.
  • Figure 4: GC methods on baseline datasets under the continual graph learning setting.
  • Figure 5: GCondenser for graph condensation with transductive (left) and inductive (right) settings.
  • ...and 14 more figures