Table of Contents
Fetching ...

Graph Condensation for Open-World Graph Learning

Xinyi Gao, Tong Chen, Wentao Zhang, Yayong Li, Xiangguo Sun, Hongzhi Yin

TL;DR

OpenGC tackles the challenge of scalable GNN training on evolving open-world graphs by introducing a condensed-graph framework that generalizes across distribution shifts. It replaces the traditional, heavy relay-GNN pipeline with a kernel ridge regression (KRR) based condenser and non-parametric graph convolution, enabling closed-form updates and faster condensation. Temporal invariance condensation augments the condensed graph with multiple structure-aware environments and an invariant risk minimization objective to capture stable patterns across time, yielding robust performance under continual graph changes. Empirical results on Yelp, Taobao, Flickr, and Coauthor show that OpenGC outperforms state-of-the-art GC methods in accuracy and condensation speed, and generalizes across GNN architectures, highlighting its practical impact for life-long, open-world graph learning.

Abstract

The burgeoning volume of graph data presents significant computational challenges in training graph neural networks (GNNs), critically impeding their efficiency in various applications. To tackle this challenge, graph condensation (GC) has emerged as a promising acceleration solution, focusing on the synthesis of a compact yet representative graph for efficiently training GNNs while retaining performance. Despite the potential to promote scalable use of GNNs, existing GC methods are limited to aligning the condensed graph with merely the observed static graph distribution. This limitation significantly restricts the generalization capacity of condensed graphs, particularly in adapting to dynamic distribution changes. In real-world scenarios, however, graphs are dynamic and constantly evolving, with new nodes and edges being continually integrated. Consequently, due to the limited generalization capacity of condensed graphs, applications that employ GC for efficient GNN training end up with sub-optimal GNNs when confronted with evolving graph structures and distributions in dynamic real-world situations. To overcome this issue, we propose open-world graph condensation (OpenGC), a robust GC framework that integrates structure-aware distribution shift to simulate evolving graph patterns and exploit the temporal environments for invariance condensation. This approach is designed to extract temporal invariant patterns from the original graph, thereby enhancing the generalization capabilities of the condensed graph and, subsequently, the GNNs trained on it. Extensive experiments on both real-world and synthetic evolving graphs demonstrate that OpenGC outperforms state-of-the-art (SOTA) GC methods in adapting to dynamic changes in open-world graph environments.

Graph Condensation for Open-World Graph Learning

TL;DR

OpenGC tackles the challenge of scalable GNN training on evolving open-world graphs by introducing a condensed-graph framework that generalizes across distribution shifts. It replaces the traditional, heavy relay-GNN pipeline with a kernel ridge regression (KRR) based condenser and non-parametric graph convolution, enabling closed-form updates and faster condensation. Temporal invariance condensation augments the condensed graph with multiple structure-aware environments and an invariant risk minimization objective to capture stable patterns across time, yielding robust performance under continual graph changes. Empirical results on Yelp, Taobao, Flickr, and Coauthor show that OpenGC outperforms state-of-the-art GC methods in accuracy and condensation speed, and generalizes across GNN architectures, highlighting its practical impact for life-long, open-world graph learning.

Abstract

The burgeoning volume of graph data presents significant computational challenges in training graph neural networks (GNNs), critically impeding their efficiency in various applications. To tackle this challenge, graph condensation (GC) has emerged as a promising acceleration solution, focusing on the synthesis of a compact yet representative graph for efficiently training GNNs while retaining performance. Despite the potential to promote scalable use of GNNs, existing GC methods are limited to aligning the condensed graph with merely the observed static graph distribution. This limitation significantly restricts the generalization capacity of condensed graphs, particularly in adapting to dynamic distribution changes. In real-world scenarios, however, graphs are dynamic and constantly evolving, with new nodes and edges being continually integrated. Consequently, due to the limited generalization capacity of condensed graphs, applications that employ GC for efficient GNN training end up with sub-optimal GNNs when confronted with evolving graph structures and distributions in dynamic real-world situations. To overcome this issue, we propose open-world graph condensation (OpenGC), a robust GC framework that integrates structure-aware distribution shift to simulate evolving graph patterns and exploit the temporal environments for invariance condensation. This approach is designed to extract temporal invariant patterns from the original graph, thereby enhancing the generalization capabilities of the condensed graph and, subsequently, the GNNs trained on it. Extensive experiments on both real-world and synthetic evolving graphs demonstrate that OpenGC outperforms state-of-the-art (SOTA) GC methods in adapting to dynamic changes in open-world graph environments.
Paper Structure (23 sections, 16 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 23 sections, 16 equations, 5 figures, 8 tables, 1 algorithm.

Figures (5)

  • Figure 1: The upper panel presents the evolution of the graph. The graph $\mathcal{T}_i$ expands as tasks ${T}_i$ progress. Varying colors of nodes represent distinct classes. The lower panel shows the test accuracy of consecutive tasks on the Yelp and Taobao datasets. The test model is GCN, which is trained on the condensed graph of the initial task $T_{1}$ and applied to evaluate subsequent tasks without fine-tuning. The test set expands following the tasks, and the evaluation is limited to the nodes belonging to the classes in $T_{1}$.
  • Figure 2: The pipeline of OpenGC. The graph $\mathcal{T}_t$ and historic graph $\mathcal{T}_{t-1}$ are encoded by non-parametric convolution and embeddings are leveraged to construct temporal environments $\mathbf{H}^e_{t}$. The condensed graph embedding $\mathbf{H}'_t$ is generated according to temporal invariance condensation loss ${\mathcal{L}_{TIC}}$. In the deployment stage, the condensed graph is utilised to train multiple GNNs with various architectures, which are applied to sequential tasks $\mathcal{T}_{j\ge t}$.
  • Figure 3: The heatmap of the differences between performance matrix (%) of OpenGC and SFGC on Yelp, Taobao, Flickr and Coauthor datasets (from left to right). Softmax is adopted and the compress ratios are 1%, 1%, 0.1%, and 1%, respectively.
  • Figure 4: The visualization of t-SNE on condensed graph by OpenGC.
  • Figure 5: The hyper-parameter sensitivity analysis.