Table of Contents
Fetching ...

GC4NC: A Benchmark Framework for Graph Condensation on Node Classification with New Insights

Shengbo Gong, Juntong Ni, Noveen Sachdeva, Carl Yang, Wei Jin

TL;DR

GC4NC addresses the lack of a unified framework for graph condensation by introducing a multi-dimensional node-classification benchmark that standardizes evaluation across performance, efficiency, privacy preservation, denoising, NAS, and transferability. The approach systematically analyzes design choices (data initialization, structure-free vs structure-based condensation, and graph property preservation) and provides deep empirical insights, such as the trade-offs between trajectory-based methods and efficiency, and the privacy and robustness benefits of GC. Key findings show that trajectory matching yields strong accuracy but high costs, while structure-free methods offer scalability; GC also offers privacy preservation and denoising advantages, depending on the configuration. The benchmark, code, and analyses are intended to guide future GC research toward scalable, private, and robust graph condensation suitable for diverse graph learning tasks.

Abstract

Graph condensation (GC) is an emerging technique designed to learn a significantly smaller graph that retains the essential information of the original graph. This condensed graph has shown promise in accelerating graph neural networks while preserving performance comparable to those achieved with the original, larger graphs. Additionally, this technique facilitates downstream applications like neural architecture search and deepens our understanding of redundancies in large graphs. Despite the rapid development of GC methods, particularly for node classification, a unified evaluation framework is still lacking to systematically compare different GC methods or clarify key design choices for improving their effectiveness. To bridge these gaps, we introduce \textbf{GC4NC}, a comprehensive framework for evaluating diverse GC methods on node classification across multiple dimensions including performance, efficiency, privacy preservation, denoising ability, NAS effectiveness, and transferability. Our systematic evaluation offers novel insights into how condensed graphs behave and the critical design choices that drive their success. These findings pave the way for future advancements in GC methods, enhancing both performance and expanding their real-world applications. Our code is available at https://github.com/Emory-Melody/GraphSlim/tree/main/benchmark.

GC4NC: A Benchmark Framework for Graph Condensation on Node Classification with New Insights

TL;DR

GC4NC addresses the lack of a unified framework for graph condensation by introducing a multi-dimensional node-classification benchmark that standardizes evaluation across performance, efficiency, privacy preservation, denoising, NAS, and transferability. The approach systematically analyzes design choices (data initialization, structure-free vs structure-based condensation, and graph property preservation) and provides deep empirical insights, such as the trade-offs between trajectory-based methods and efficiency, and the privacy and robustness benefits of GC. Key findings show that trajectory matching yields strong accuracy but high costs, while structure-free methods offer scalability; GC also offers privacy preservation and denoising advantages, depending on the configuration. The benchmark, code, and analyses are intended to guide future GC research toward scalable, private, and robust graph condensation suitable for diverse graph learning tasks.

Abstract

Graph condensation (GC) is an emerging technique designed to learn a significantly smaller graph that retains the essential information of the original graph. This condensed graph has shown promise in accelerating graph neural networks while preserving performance comparable to those achieved with the original, larger graphs. Additionally, this technique facilitates downstream applications like neural architecture search and deepens our understanding of redundancies in large graphs. Despite the rapid development of GC methods, particularly for node classification, a unified evaluation framework is still lacking to systematically compare different GC methods or clarify key design choices for improving their effectiveness. To bridge these gaps, we introduce \textbf{GC4NC}, a comprehensive framework for evaluating diverse GC methods on node classification across multiple dimensions including performance, efficiency, privacy preservation, denoising ability, NAS effectiveness, and transferability. Our systematic evaluation offers novel insights into how condensed graphs behave and the critical design choices that drive their success. These findings pave the way for future advancements in GC methods, enhancing both performance and expanding their real-world applications. Our code is available at https://github.com/Emory-Melody/GraphSlim/tree/main/benchmark.

Paper Structure

This paper contains 43 sections, 10 figures, 18 tables.

Figures (10)

  • Figure 1: Test accuracy vs. total time for structure-free and structure-based condensation methods on Arxiv. TM is represented by $\bigstar$, GM by $\bullet$, and DM by $\blacktriangle$. Marker sizes increase with reduction rates of 0.05%, 0.25%, and 0.50%.
  • Figure 2: Comparison of GPU memory, disk memory, preprocess time, and total time on Arxiv ($r=0.5\%$).
  • Figure 3: Varying reduction rates on Arxiv and Reddit. No mark represents OOM when the reduction rate is too large for a method.
  • Figure 4: Condensed graph performance evaluated by different GNNs. The relative accuracy refers to the accuracy preserved compared to training on the whole dataset.
  • Figure 5: Test accuracy for different methods with different initialization.
  • ...and 5 more figures