Table of Contents
Fetching ...

Graph Data Condensation via Self-expressive Graph Structure Reconstruction

Zhanyu Liu, Chaolv Zeng, Guanjie Zheng

TL;DR

This work tackles graph data condensation by introducing GCSR, a framework that explicitly leverages the original graph structure to condense large graphs into small, interpretable synthetic graphs. It combines a three-module pipeline—Initialization, Self-expressive Reconstruction, and Update—with a self-expressive reconstruction objective that includes a regularizer derived from the original graph and a history term for stability. Empirical results across five real-world datasets and multiple GNN architectures show that GCSR consistently outperforms existing condensation methods, preserves inter-class relationships, and yields interpretable node-edge connections. The approach offers practical benefits for scalable GNN training and provides deeper insight into how condensed graphs reflect the structure of their originals.

Abstract

With the increasing demands of training graph neural networks (GNNs) on large-scale graphs, graph data condensation has emerged as a critical technique to relieve the storage and time costs during the training phase. It aims to condense the original large-scale graph to a much smaller synthetic graph while preserving the essential information necessary for efficiently training a downstream GNN. However, existing methods concentrate either on optimizing node features exclusively or endeavor to independently learn node features and the graph structure generator. They could not explicitly leverage the information of the original graph structure and failed to construct an interpretable graph structure for the synthetic dataset. To address these issues, we introduce a novel framework named \textbf{G}raph Data \textbf{C}ondensation via \textbf{S}elf-expressive Graph Structure \textbf{R}econstruction (\textbf{GCSR}). Our method stands out by (1) explicitly incorporating the original graph structure into the condensing process and (2) capturing the nuanced interdependencies between the condensed nodes by reconstructing an interpretable self-expressive graph structure. Extensive experiments and comprehensive analysis validate the efficacy of the proposed method across diverse GNN models and datasets. Our code is available at \url{https://github.com/zclzcl0223/GCSR}.

Graph Data Condensation via Self-expressive Graph Structure Reconstruction

TL;DR

This work tackles graph data condensation by introducing GCSR, a framework that explicitly leverages the original graph structure to condense large graphs into small, interpretable synthetic graphs. It combines a three-module pipeline—Initialization, Self-expressive Reconstruction, and Update—with a self-expressive reconstruction objective that includes a regularizer derived from the original graph and a history term for stability. Empirical results across five real-world datasets and multiple GNN architectures show that GCSR consistently outperforms existing condensation methods, preserves inter-class relationships, and yields interpretable node-edge connections. The approach offers practical benefits for scalable GNN training and provides deeper insight into how condensed graphs reflect the structure of their originals.

Abstract

With the increasing demands of training graph neural networks (GNNs) on large-scale graphs, graph data condensation has emerged as a critical technique to relieve the storage and time costs during the training phase. It aims to condense the original large-scale graph to a much smaller synthetic graph while preserving the essential information necessary for efficiently training a downstream GNN. However, existing methods concentrate either on optimizing node features exclusively or endeavor to independently learn node features and the graph structure generator. They could not explicitly leverage the information of the original graph structure and failed to construct an interpretable graph structure for the synthetic dataset. To address these issues, we introduce a novel framework named \textbf{G}raph Data \textbf{C}ondensation via \textbf{S}elf-expressive Graph Structure \textbf{R}econstruction (\textbf{GCSR}). Our method stands out by (1) explicitly incorporating the original graph structure into the condensing process and (2) capturing the nuanced interdependencies between the condensed nodes by reconstructing an interpretable self-expressive graph structure. Extensive experiments and comprehensive analysis validate the efficacy of the proposed method across diverse GNN models and datasets. Our code is available at \url{https://github.com/zclzcl0223/GCSR}.
Paper Structure (28 sections, 18 equations, 6 figures, 12 tables, 1 algorithm)

This paper contains 28 sections, 18 equations, 6 figures, 12 tables, 1 algorithm.

Figures (6)

  • Figure 1: Learning pipelines of different graph condensation methods. $\textbf{X},\textbf{A}$ denotes the original full dataset and $\textbf{X}',\textbf{A}'$ denotes condensed dataset. SER is our proposed self-expressive graph structure reconstruction module and Reg is the regularization term for reconstruction.
  • Figure 2: Overview of Graph Data Condensation via Self-expressive Graph Structure Reconstruction (GCSR).
  • Figure 3: Cross-Class Neighborhood Similarity (CCNS) of Citeseer generated from the synthetic graph with a 3.6% condensation ratio from (a) GCond, (b) SGDD, and (c) GCSR, as well as (d) the original graph. The axes represent the classes.
  • Figure 4: Visualizations of (a) the inner product of the synthetic nodes, (b) the updated probabilistic graph, and (c) the synthetic topology structure. The dataset illustrated is Citeseer with the condensation ratio set to 1.8%.
  • Figure 5: Distributions of the synthetic nodes condensed by four methods (GCond, SFGC, SGDD, and GCSR) on Cora under a 2.6% condensation ratio. $\text{SC}$ represents the average silhouette coefficient of the synthetic data.
  • ...and 1 more figures