Table of Contents
Fetching ...

CGCL: Collaborative Graph Contrastive Learning without Handcrafted Graph Data Augmentations

Tianyu Zhang, Yuxiang Ren, Wenzheng Feng, Weitao Du, Xuecang Zhang

TL;DR

This paper tackles the instability of handcrafted augmentations in unsupervised graph contrastive learning by introducing Collaborative Graph Contrastive Learning (CGCL), a framework that uses multiple heterogeneous GNN encoders to generate contrastive views from the encoder perspective. By enforcing an asymmetric architecture and encouraging complementary encoders, CGCL avoids reliance on perturbations and mitigates model collapse, with two metrics—Asymmetry Coefficient ($AC$) and Complementarity Coefficient ($CC$)—to quantify the assembly. Extensive experiments on nine graph benchmarks show CGCL achieving state-of-the-art or competitive graph-level representations without augmentations, with results improving when the encoder assembly exhibits high asymmetry and complementarity. The work also provides a theoretical and empirical basis for evaluating encoder assemblies and contributes a reproducible implementation (code available online).

Abstract

Unsupervised graph representation learning is a non-trivial topic. The success of contrastive methods in the unsupervised representation learning on structured data inspires similar attempts on the graph. Existing graph contrastive learning (GCL) aims to learn the invariance across multiple augmentation views, which renders it heavily reliant on the handcrafted graph augmentations. However, inappropriate graph data augmentations can potentially jeopardize such invariance. In this paper, we show the potential hazards of inappropriate augmentations and then propose a novel Collaborative Graph Contrastive Learning framework (CGCL). This framework harnesses multiple graph encoders to observe the graph. Features observed from different encoders serve as the contrastive views in contrastive learning, which avoids inducing unstable perturbation and guarantees the invariance. To ensure the collaboration among diverse graph encoders, we propose the concepts of asymmetric architecture and complementary encoders as the design principle. To further prove the rationality, we utilize two quantitative metrics to measure the assembly of CGCL respectively. Extensive experiments demonstrate the advantages of CGCL in unsupervised graph-level representation learning and the potential of collaborative framework. The source code for reproducibility is available at https://github.com/zhangtia16/CGCL

CGCL: Collaborative Graph Contrastive Learning without Handcrafted Graph Data Augmentations

TL;DR

This paper tackles the instability of handcrafted augmentations in unsupervised graph contrastive learning by introducing Collaborative Graph Contrastive Learning (CGCL), a framework that uses multiple heterogeneous GNN encoders to generate contrastive views from the encoder perspective. By enforcing an asymmetric architecture and encouraging complementary encoders, CGCL avoids reliance on perturbations and mitigates model collapse, with two metrics—Asymmetry Coefficient () and Complementarity Coefficient ()—to quantify the assembly. Extensive experiments on nine graph benchmarks show CGCL achieving state-of-the-art or competitive graph-level representations without augmentations, with results improving when the encoder assembly exhibits high asymmetry and complementarity. The work also provides a theoretical and empirical basis for evaluating encoder assemblies and contributes a reproducible implementation (code available online).

Abstract

Unsupervised graph representation learning is a non-trivial topic. The success of contrastive methods in the unsupervised representation learning on structured data inspires similar attempts on the graph. Existing graph contrastive learning (GCL) aims to learn the invariance across multiple augmentation views, which renders it heavily reliant on the handcrafted graph augmentations. However, inappropriate graph data augmentations can potentially jeopardize such invariance. In this paper, we show the potential hazards of inappropriate augmentations and then propose a novel Collaborative Graph Contrastive Learning framework (CGCL). This framework harnesses multiple graph encoders to observe the graph. Features observed from different encoders serve as the contrastive views in contrastive learning, which avoids inducing unstable perturbation and guarantees the invariance. To ensure the collaboration among diverse graph encoders, we propose the concepts of asymmetric architecture and complementary encoders as the design principle. To further prove the rationality, we utilize two quantitative metrics to measure the assembly of CGCL respectively. Extensive experiments demonstrate the advantages of CGCL in unsupervised graph-level representation learning and the potential of collaborative framework. The source code for reproducibility is available at https://github.com/zhangtia16/CGCL

Paper Structure

This paper contains 20 sections, 6 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: An illustration of the unstable invariance of three graph augmentation strategies. The value of each augmented graph is its similarity to the original graph. The upper part of each augmentation strategy shows augmented graphs preserving high invariance, while the lower part's augmentations bring low invariance.
  • Figure 2: Framework overview of CGCL. Graph Encoder 1, 2, $\cdots$, $k$ embedded the mini-batch graphs into low-dimensional vectors. To optimize the framework collaboratively, each graph encoder calculates its own contrastive loss with the help of others.
  • Figure 3: Calculation of correlation between RDMs. With $k$ encoders, Asymmetry Coefficient is calculated by averaging the correlation between any pair of encoders.
  • Figure 4: The performance of CGCL's assembly with respect to asymmetry and complementarity over multiple datasets. Point A indicates an example of the assembly with a high AC and CC. While point B has a high AC and a low CC, point C is with a low AC and a high CC high. Point D an E refer to the examples whose AC and CC are both low.
  • Figure 5: Empirical convergence study of different graph encoders in $\text{CGCL}$ on PROTEINS and IMDB-BINARY.

Theorems & Definitions (2)

  • definition thmcounterdefinition
  • definition thmcounterdefinition