Fast Graph Condensation with Structure-based Neural Tangent Kernel
Lin Wang, Wenqi Fan, Jiatong Li, Yao Ma, Qing Li
TL;DR
The paper tackles the inefficiency of graph-data condensation for training GNNs on large graphs by reframing the problem as Kernel Ridge Regression (KRR) instead of a costly bi-level optimization. It introduces GC-SNTK, a framework that uses a Structure-based Neural Tangent Kernel (SNTK) to capture graph topology through neighborhood aggregation within the KRR paradigm, enabling a single-loop optimization. The approach yields substantial speedups over prior bi-level methods while preserving or surpassing predictive performance across multiple graph datasets and GNN architectures, and it demonstrates robust cross-model generalization. These findings suggest a practical route to scalable graph condensation that leverages infinite-width network insights via NTK while respecting graph structure for effective downstream learning.
Abstract
The rapid development of Internet technology has given rise to a vast amount of graph-structured data. Graph Neural Networks (GNNs), as an effective method for various graph mining tasks, incurs substantial computational resource costs when dealing with large-scale graph data. A data-centric manner solution is proposed to condense the large graph dataset into a smaller one without sacrificing the predictive performance of GNNs. However, existing efforts condense graph-structured data through a computational intensive bi-level optimization architecture also suffer from massive computation costs. In this paper, we propose reforming the graph condensation problem as a Kernel Ridge Regression (KRR) task instead of iteratively training GNNs in the inner loop of bi-level optimization. More specifically, We propose a novel dataset condensation framework (GC-SNTK) for graph-structured data, where a Structure-based Neural Tangent Kernel (SNTK) is developed to capture the topology of graph and serves as the kernel function in KRR paradigm. Comprehensive experiments demonstrate the effectiveness of our proposed model in accelerating graph condensation while maintaining high prediction performance. The source code is available on https://github.com/WANGLin0126/GCSNTK.
