Table of Contents
Fetching ...

Navigating Complexity: Toward Lossless Graph Condensation via Expanding Window Matching

Yuchen Zhang, Tianle Zhang, Kai Wang, Ziyao Guo, Yuxuan Liang, Xavier Bresson, Wei Jin, Yang You

TL;DR

This paper addresses the challenge of lossless graph condensation by identifying biases in prior trajectory-based supervision and introducing GEOM, which combines curriculum-learning-trained expert trajectories with expanding window matching to transfer diverse supervision signals into a condensed graph. A knowledge embedding extractor and theoretical analysis further justify the approach, while extensive experiments across five datasets and multiple GNN architectures show GEOM achieving lossless condensation at or below 5% and strong cross-architecture generalization. The work substantially reduces the computational burden of training GNNs on large graphs and offers a practical path toward scalable graph learning, though it relies on pre-computed expert trajectories and invites efficiency-focused future work.

Abstract

Graph condensation aims to reduce the size of a large-scale graph dataset by synthesizing a compact counterpart without sacrificing the performance of Graph Neural Networks (GNNs) trained on it, which has shed light on reducing the computational cost for training GNNs. Nevertheless, existing methods often fall short of accurately replicating the original graph for certain datasets, thereby failing to achieve the objective of lossless condensation. To understand this phenomenon, we investigate the potential reasons and reveal that the previous state-of-the-art trajectory matching method provides biased and restricted supervision signals from the original graph when optimizing the condensed one. This significantly limits both the scale and efficacy of the condensed graph. In this paper, we make the first attempt toward \textit{lossless graph condensation} by bridging the previously neglected supervision signals. Specifically, we employ a curriculum learning strategy to train expert trajectories with more diverse supervision signals from the original graph, and then effectively transfer the information into the condensed graph with expanding window matching. Moreover, we design a loss function to further extract knowledge from the expert trajectories. Theoretical analysis justifies the design of our method and extensive experiments verify its superiority across different datasets. Code is released at https://github.com/NUS-HPC-AI-Lab/GEOM.

Navigating Complexity: Toward Lossless Graph Condensation via Expanding Window Matching

TL;DR

This paper addresses the challenge of lossless graph condensation by identifying biases in prior trajectory-based supervision and introducing GEOM, which combines curriculum-learning-trained expert trajectories with expanding window matching to transfer diverse supervision signals into a condensed graph. A knowledge embedding extractor and theoretical analysis further justify the approach, while extensive experiments across five datasets and multiple GNN architectures show GEOM achieving lossless condensation at or below 5% and strong cross-architecture generalization. The work substantially reduces the computational burden of training GNNs on large graphs and offers a practical path toward scalable graph learning, though it relies on pre-computed expert trajectories and invites efficiency-focused future work.

Abstract

Graph condensation aims to reduce the size of a large-scale graph dataset by synthesizing a compact counterpart without sacrificing the performance of Graph Neural Networks (GNNs) trained on it, which has shed light on reducing the computational cost for training GNNs. Nevertheless, existing methods often fall short of accurately replicating the original graph for certain datasets, thereby failing to achieve the objective of lossless condensation. To understand this phenomenon, we investigate the potential reasons and reveal that the previous state-of-the-art trajectory matching method provides biased and restricted supervision signals from the original graph when optimizing the condensed one. This significantly limits both the scale and efficacy of the condensed graph. In this paper, we make the first attempt toward \textit{lossless graph condensation} by bridging the previously neglected supervision signals. Specifically, we employ a curriculum learning strategy to train expert trajectories with more diverse supervision signals from the original graph, and then effectively transfer the information into the condensed graph with expanding window matching. Moreover, we design a loss function to further extract knowledge from the expert trajectories. Theoretical analysis justifies the design of our method and extensive experiments verify its superiority across different datasets. Code is released at https://github.com/NUS-HPC-AI-Lab/GEOM.
Paper Structure (20 sections, 1 theorem, 3 equations, 5 figures, 5 tables)

This paper contains 20 sections, 1 theorem, 3 equations, 5 figures, 5 tables.

Key Result

Theorem 3.1

During the evaluation phase, the accumulated error at any stage is determined by its initial value, the sum of matching error, and the initialization error starting from the second stage.

Figures (5)

  • Figure 1: (a) and (b) illustrate the ablation study on whether to use the KEE. (c) and (d) illustrate the ablation of the tunable hyperparameter $\alpha$, which determines the weights of the optimization item generated by the KEE.
  • Figure 2: T-SNE visualization on the condensed graph. Nodes of the same class are in the same color. SC$\uparrow$, DB$\downarrow$, and CH$\uparrow$ in the figure refer to the Silhouette Coefficient, Davies-Bouldin Index, and Calinski-Harabasz Index respectively. $\uparrow$ and $\downarrow$ denote the clustering pattern is better when the value is higher or lower.
  • Figure 3: Comparison of methods for evaluating and storing condensed graphs.
  • Figure 4: Performance with different step combinations of $q$ student steps and expert $p$ steps on Ogbn-arxiv ($r$ = 0.5%).
  • Figure 5: Visualization of t-SNE on condensed graphs

Theorems & Definitions (2)

  • Theorem 3.1
  • proof