Efficient Graph Condensation via Gaussian Process
Lin Wang, Qing Li
TL;DR
This work introduces Graph Condensation via Gaussian Process (GCGP), a training-free condensation framework that replaces costly bi-level GNN optimization with Gaussian process regression on a condensed graph. A graph-structure covariance function that incorporates $k$-hop neighborhood information and a concrete relaxation of the adjacency matrix enable efficient, differentiable optimization of the condensed graph. Empirical results across seven datasets show GCGP achieves competitive or superior condensation quality with substantial speedups over state-of-the-art methods, and the method generalizes across multiple downstream GNN architectures. The approach significantly improves scalability for graph learning on large-scale graphs while maintaining predictive performance, with code publicly available.
Abstract
Graph condensation reduces the size of large graphs while preserving performance, addressing the scalability challenges of Graph Neural Networks caused by computational inefficiencies on large datasets. Existing methods often rely on bi-level optimization, requiring extensive GNN training and limiting their scalability. To address these issues, this paper proposes Graph Condensation via Gaussian Process (GCGP), a novel and computationally efficient approach to graph condensation. GCGP utilizes a Gaussian Process (GP), with the condensed graph serving as observations, to estimate the posterior distribution of predictions. This approach eliminates the need for the iterative and resource-intensive training typically required by GNNs. To enhance the capability of the GCGP in capturing dependencies between function values, we derive a specialized covariance function that incorporates structural information. This covariance function broadens the receptive field of input nodes by local neighborhood aggregation, thereby facilitating the representation of intricate dependencies within the nodes. To address the challenge of optimizing binary structural information in condensed graphs, Concrete random variables are utilized to approximate the binary adjacency matrix in a continuous counterpart. This relaxation process allows the adjacency matrix to be represented in a differentiable form, enabling the application of gradient-based optimization techniques to discrete graph structures. Experimental results show that the proposed GCGP method efficiently condenses large-scale graph data while preserving predictive performance, addressing the scalability and efficiency challenges. The implementation of our method is publicly available at https://github.com/WANGLin0126/GCGP.
