Table of Contents
Fetching ...

Efficient Graph Condensation via Gaussian Process

Lin Wang, Qing Li

TL;DR

This work introduces Graph Condensation via Gaussian Process (GCGP), a training-free condensation framework that replaces costly bi-level GNN optimization with Gaussian process regression on a condensed graph. A graph-structure covariance function that incorporates $k$-hop neighborhood information and a concrete relaxation of the adjacency matrix enable efficient, differentiable optimization of the condensed graph. Empirical results across seven datasets show GCGP achieves competitive or superior condensation quality with substantial speedups over state-of-the-art methods, and the method generalizes across multiple downstream GNN architectures. The approach significantly improves scalability for graph learning on large-scale graphs while maintaining predictive performance, with code publicly available.

Abstract

Graph condensation reduces the size of large graphs while preserving performance, addressing the scalability challenges of Graph Neural Networks caused by computational inefficiencies on large datasets. Existing methods often rely on bi-level optimization, requiring extensive GNN training and limiting their scalability. To address these issues, this paper proposes Graph Condensation via Gaussian Process (GCGP), a novel and computationally efficient approach to graph condensation. GCGP utilizes a Gaussian Process (GP), with the condensed graph serving as observations, to estimate the posterior distribution of predictions. This approach eliminates the need for the iterative and resource-intensive training typically required by GNNs. To enhance the capability of the GCGP in capturing dependencies between function values, we derive a specialized covariance function that incorporates structural information. This covariance function broadens the receptive field of input nodes by local neighborhood aggregation, thereby facilitating the representation of intricate dependencies within the nodes. To address the challenge of optimizing binary structural information in condensed graphs, Concrete random variables are utilized to approximate the binary adjacency matrix in a continuous counterpart. This relaxation process allows the adjacency matrix to be represented in a differentiable form, enabling the application of gradient-based optimization techniques to discrete graph structures. Experimental results show that the proposed GCGP method efficiently condenses large-scale graph data while preserving predictive performance, addressing the scalability and efficiency challenges. The implementation of our method is publicly available at https://github.com/WANGLin0126/GCGP.

Efficient Graph Condensation via Gaussian Process

TL;DR

This work introduces Graph Condensation via Gaussian Process (GCGP), a training-free condensation framework that replaces costly bi-level GNN optimization with Gaussian process regression on a condensed graph. A graph-structure covariance function that incorporates -hop neighborhood information and a concrete relaxation of the adjacency matrix enable efficient, differentiable optimization of the condensed graph. Empirical results across seven datasets show GCGP achieves competitive or superior condensation quality with substantial speedups over state-of-the-art methods, and the method generalizes across multiple downstream GNN architectures. The approach significantly improves scalability for graph learning on large-scale graphs while maintaining predictive performance, with code publicly available.

Abstract

Graph condensation reduces the size of large graphs while preserving performance, addressing the scalability challenges of Graph Neural Networks caused by computational inefficiencies on large datasets. Existing methods often rely on bi-level optimization, requiring extensive GNN training and limiting their scalability. To address these issues, this paper proposes Graph Condensation via Gaussian Process (GCGP), a novel and computationally efficient approach to graph condensation. GCGP utilizes a Gaussian Process (GP), with the condensed graph serving as observations, to estimate the posterior distribution of predictions. This approach eliminates the need for the iterative and resource-intensive training typically required by GNNs. To enhance the capability of the GCGP in capturing dependencies between function values, we derive a specialized covariance function that incorporates structural information. This covariance function broadens the receptive field of input nodes by local neighborhood aggregation, thereby facilitating the representation of intricate dependencies within the nodes. To address the challenge of optimizing binary structural information in condensed graphs, Concrete random variables are utilized to approximate the binary adjacency matrix in a continuous counterpart. This relaxation process allows the adjacency matrix to be represented in a differentiable form, enabling the application of gradient-based optimization techniques to discrete graph structures. Experimental results show that the proposed GCGP method efficiently condenses large-scale graph data while preserving predictive performance, addressing the scalability and efficiency challenges. The implementation of our method is publicly available at https://github.com/WANGLin0126/GCGP.
Paper Structure (21 sections, 20 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 21 sections, 20 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: Graph condensation reduces a large graph dataset $G$ into a smaller one $G^{\mathcal{S}}$ with fewer nodes, while preserving essential information.
  • Figure 2: Condensation time and accuracy comparison on Reddit dataset. By condensing 153,431 training nodes into a graph with only 77 synthetic nodes, the GCGP achieves an accuracy of 93.9% on the test set, which is comparable to the accuracy obtained using the full training set on GCN, while also demonstrating the fastest runtime. For more experimental results about the condensation efficiency, please refer to Section \ref{['sec:effi']}.
  • Figure 3: The workflow of the proposed GCGP framework involves three key steps. First, the condensed synthetic graph $G^{\mathcal{S}}$ is utilized as the observations for the GP. Next, predictions are generated for the test locations, corresponding to the original graph $G$. Finally, the condensed graph is iteratively optimized by minimizing the discrepancy between the GP's predictions and the ground-truth labels.
  • Figure 4: The runtime efficiency of GC-SNTK, SimGC, DisCo, and GCGP is assessed using the One-step method as the baseline. To quantify relative performance, a speedup factor is calculated by dividing the runtime of the One-step method by the runtime of the other methods and represents the relative acceleration of each method.
  • Figure 5: Condensation efficiency comparison between the proposed GCGP method and GC-SNTK is conducted on five datasets. The x-axis represents training time, while the y-axis indicates test accuracy. The results show that GCGP consistently achieves faster training times than GC-SNTK across all condensation scales on these datasets.
  • ...and 2 more figures