Provable Training for Graph Contrastive Learning

Yue Yu; Xiao Wang; Mengmei Zhang; Nian Liu; Chuan Shi

Provable Training for Graph Contrastive Learning

Yue Yu, Xiao Wang, Mengmei Zhang, Nian Liu, Chuan Shi

TL;DR

This work addresses the problem that Graph Contrastive Learning (GCL) training is imbalanced across nodes under graph augmentations. It introduces node compactness as a per-node measure and derives provable lower bounds via bound propagation to regularize training with PrOvable Training (POT). POT augments the standard InfoNCE objective with a BCE-based penalty that encourages node embeddings to align with the GCL principle across augmentations, and it can be plugged into existing GCL methods. Extensive experiments across eight datasets show POT improves multiple baselines, increases node compactness, and demonstrates robustness to augmentation choices, highlighting its practical utility for stabilizing and improving GCL-based representation learning.

Abstract

Graph Contrastive Learning (GCL) has emerged as a popular training approach for learning node embeddings from augmented graphs without labels. Despite the key principle that maximizing the similarity between positive node pairs while minimizing it between negative node pairs is well established, some fundamental problems are still unclear. Considering the complex graph structure, are some nodes consistently well-trained and following this principle even with different graph augmentations? Or are there some nodes more likely to be untrained across graph augmentations and violate the principle? How to distinguish these nodes and further guide the training of GCL? To answer these questions, we first present experimental evidence showing that the training of GCL is indeed imbalanced across all nodes. To address this problem, we propose the metric "node compactness", which is the lower bound of how a node follows the GCL principle related to the range of augmentations. We further derive the form of node compactness theoretically through bound propagation, which can be integrated into binary cross-entropy as a regularization. To this end, we propose the PrOvable Training (POT) for GCL, which regularizes the training of GCL to encode node embeddings that follows the GCL principle better. Through extensive experiments on various benchmarks, POT consistently improves the existing GCL approaches, serving as a friendly plugin.

Provable Training for Graph Contrastive Learning

TL;DR

Abstract

Paper Structure (32 sections, 3 theorems, 14 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 32 sections, 3 theorems, 14 equations, 9 figures, 1 table, 1 algorithm.

Introduction
Preliminaries
The Imbalanced Training of GCL: an Experimental Study
Methodology
Evaluating How the Nodes Follow the GCL Principle
Provable Training of GCL
Deriving the Lower Bounds
Defining the adjacency matrix with continuous values
Relaxing the nonlinearities in the GCN encoder
Time Complexity
Experiments
Experimental Setup
Node Classification
Analyzing the Node Compactness
Hyperparameter Analysis
...and 17 more sections

Key Result

Theorem 1

If $f=\sigma(\mathbf{\hat{A}} \sigma(\mathbf{\hat{A}} \mathbf{X} \mathbf{\mathbf{W^{(1)}}} + \mathbf b^{(1)}) \mathbf{\mathbf{W^{(2)}}} + \mathbf b^{(2)})$ is the two-layer GCN encoder, given the element-wise bounds of $\mathbf{\hat{A}}$ in Definition def: mp, $\mathbf H^{(k)}$ is the input embeddin where $[\mathbf{X}]_+ = \max (\mathbf{X}, 0)$ and $[\mathbf{X}]_- = \min (\mathbf{X}, 0)$.

Figures (9)

Figure 1: The imbalance of GCL training
Figure 2: Node compactness in the training process
Figure 3: Degree and node compactness score
Figure 4: Hyperparamenter analysis on $\kappa$
Figure 5: More results showing the imbalance of GCL training
...and 4 more figures

Theorems & Definitions (10)

Definition 1: Node Compactness
Definition 2: $G_2$-Node Compactness
Definition 3: Augmented adjacency matrix
Definition 4: Augmented message-passing matrix
Definition 5: Linear bounds of non-linear activation function
Theorem 1: Pre-activation bounds of each layer sdp
Theorem 2: The lower bound of neural network output
proof
Lemma 1: Bound propagation of affine layers sdp
proof

Provable Training for Graph Contrastive Learning

TL;DR

Abstract

Provable Training for Graph Contrastive Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (10)