StructComp: Substituting Propagation with Structural Compression in Training Graph Contrastive Learning

Shengzhong Zhang; Wenjie Yang; Xinyuan Cao; Hongwei Zhang; Zengfeng Huang

StructComp: Substituting Propagation with Structural Compression in Training Graph Contrastive Learning

Shengzhong Zhang, Wenjie Yang, Xinyuan Cao, Hongwei Zhang, Zengfeng Huang

TL;DR

StructComp introduces a scalable training framework for graph contrastive learning by substituting costly message passing with node compression based on a sparse low-rank diffusion approximation. It uses a graph partition to form compressed features $X_c = P^T X$ and a compressed graph $A_c = P^T A P$, enabling training with an MLP encoder on $X_c$ while retaining full graph structure for inference. Theoretical results establish that the compressed loss closely approximates the original GCL loss with a bound depending on the partition remainder, and multi-view StructComp adds a regularization effect that improves robustness. Empirically, StructComp significantly reduces memory and time requirements while achieving competitive or improved accuracy across small and large graphs, demonstrating strong scalability for diverse GCL models.

Abstract

Graph contrastive learning (GCL) has become a powerful tool for learning graph data, but its scalability remains a significant challenge. In this work, we propose a simple yet effective training framework called Structural Compression (StructComp) to address this issue. Inspired by a sparse low-rank approximation on the diffusion matrix, StructComp trains the encoder with the compressed nodes. This allows the encoder not to perform any message passing during the training stage, and significantly reduces the number of sample pairs in the contrastive loss. We theoretically prove that the original GCL loss can be approximated with the contrastive loss computed by StructComp. Moreover, StructComp can be regarded as an additional regularization term for GCL models, resulting in a more robust encoder. Empirical studies on various datasets show that StructComp greatly reduces the time and memory consumption while improving model performance compared to the vanilla GCL models and scalable training methods.

StructComp: Substituting Propagation with Structural Compression in Training Graph Contrastive Learning

TL;DR

and a compressed graph

, enabling training with an MLP encoder on

while retaining full graph structure for inference. Theoretical results establish that the compressed loss closely approximates the original GCL loss with a bound depending on the partition remainder, and multi-view StructComp adds a regularization effect that improves robustness. Empirically, StructComp significantly reduces memory and time requirements while achieving competitive or improved accuracy across small and large graphs, demonstrating strong scalability for diverse GCL models.

Abstract

Paper Structure (37 sections, 4 theorems, 30 equations, 4 figures, 15 tables)

This paper contains 37 sections, 4 theorems, 30 equations, 4 figures, 15 tables.

introduction
Preliminaries
Structural Compression
motivation
Framework of StructComp
Theory analysis of StructComp
The equivalence of the compressed loss and the original loss
The regularization introduced by StructComp
related work
Scalable training on graph
Experiment
Experimental Setup
Experimental Results
Performance on small-scale datasets
Time and memory usage for small-scale datasets
...and 22 more sections

Key Result

Theorem 4.1

For the random graph $G(n,p)$ from Erdős-Rényi model, we construct an even partition $\mathcal{P}=\{S_1,\cdots,S_{n'}\}$. Let $f_G(X)=AXW$ be a feature mapping in the original graph and $f_\mathcal{P}(X)=P^{' T} XW$ as a linear mapping for the mixed nodes, where $W\in\mathbb{R}^{d\times d'}$. Then b

Figures (4)

Figure 1: The overall framework of single-view StructComp.
Figure 2: The training process of multi-view StructComp.
Figure 3: The trends of the original GCL loss and the loss that computed by StructComp-trained parameters. "loss$\_$o" is $\mathcal{L}(A,X;W)$ and "loss$\_$c" is $\mathcal{L}(A,X;U)$ where $U$ is trained with $\mathcal{L}(X_c;U)$.
Figure 4: The influence of the compression rate on the performance of StructComp.

Theorems & Definitions (6)

Theorem 4.1
Theorem 4.2
Theorem 4.1
proof
Theorem 4.2
proof

StructComp: Substituting Propagation with Structural Compression in Training Graph Contrastive Learning

TL;DR

Abstract

StructComp: Substituting Propagation with Structural Compression in Training Graph Contrastive Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (6)