Contrastive Graph Condensation: Advancing Data Versatility through Self-Supervised Learning
Xinyi Gao, Yayong Li, Tong Chen, Guanhua Ye, Wentao Zhang, Hongzhi Yin
TL;DR
This work tackles graph condensation under label scarcity by introducing Contrastive Graph Condensation (CTGC), a self-supervised framework that disentangles semantic and structural information via a dual-branch relay architecture. The semantic branch processes node attributes while the structural branch encodes geometric information through spectral embeddings using EigenMLP, with an alternating optimization that aligns both branches through clustering-based contrastive losses. Graph generation is achieved through a model-inversion process, recovering a condensed topology and node attributes from the learned centroids, enabling label-free pre-training for diverse downstream tasks. Empirical results on multiple datasets show CTGC consistently outperforms state-of-the-art GC methods, especially at high condensation ratios, and demonstrates strong generalization across GNN architectures and tasks such as node classification, link prediction, and clustering.
Abstract
With the increasing computation of training graph neural networks (GNNs) on large-scale graphs, graph condensation (GC) has emerged as a promising solution to synthesize a compact, substitute graph of the large-scale original graph for efficient GNN training. However, existing GC methods predominantly employ classification as the surrogate task for optimization, thus excessively relying on node labels and constraining their utility in label-sparsity scenarios. More critically, this surrogate task tends to overfit class-specific information within the condensed graph, consequently restricting the generalization capabilities of GC for other downstream tasks. To address these challenges, we introduce Contrastive Graph Condensation (CTGC), which adopts a self-supervised surrogate task to extract critical, causal information from the original graph and enhance the cross-task generalizability of the condensed graph. Specifically, CTGC employs a dual-branch framework to disentangle the generation of the node attributes and graph structures, where a dedicated structural branch is designed to explicitly encode geometric information through nodes' positional embeddings. By implementing an alternating optimization scheme with contrastive loss terms, CTGC promotes the mutual enhancement of both branches and facilitates high-quality graph generation through the model inversion technique. Extensive experiments demonstrate that CTGC excels in handling various downstream tasks with a limited number of labels, consistently outperforming state-of-the-art GC methods.
