Table of Contents
Fetching ...

Why Does Dropping Edges Usually Outperform Adding Edges in Graph Contrastive Learning?

Yanchen Xu, Siqi Huang, Hongyuan Zhang, Xuelong Li

TL;DR

The paper tackles why dropping edges often outperforms adding edges in graph contrastive learning by introducing Error Passing Rate (EPR), a metric quantifying how well a graph fits a GNN. It shows theoretically that adding edges can increase or decrease EPR depending on class relationships, and it derives conditions under which edge dropping is more stable, ultimately proposing EPAGCL, an EPR-driven adaptive augmentation that combines edge adding/dropping with a feature mask and InfoNCE objective. Extensive experiments across seven real-world datasets demonstrate EPAGCL's improved performance and stability over baselines, while analyzing trade-offs in efficiency and providing ablations to validate augmentation choices. The approach offers a principled, scalable augmentation framework for robust GCL, with practical relevance to node classification tasks in diverse graphs.

Abstract

Graph contrastive learning (GCL) has been widely used as an effective self-supervised learning method for graph representation learning. However, how to apply adequate and stable graph augmentation to generating proper views for contrastive learning remains an essential problem. Dropping edges is a primary augmentation in GCL while adding edges is not a common method due to its unstable performance. To our best knowledge, there is no theoretical analysis to study why dropping edges usually outperforms adding edges. To answer this question, we introduce a new metric, namely Error Passing Rate (EPR), to quantify how a graph fits the network. Inspired by the theoretical conclusions and the idea of positive-incentive noise, we propose a novel GCL algorithm, Error-PAssing-based Graph Contrastive Learning (EPAGCL), which uses both edge adding and edge dropping as its augmentations. To be specific, we generate views by adding and dropping edges based on the weights derived from EPR. Extensive experiments on various real-world datasets are conducted to validate the correctness of our theoretical analysis and the effectiveness of our proposed algorithm. Our code is available at: https://github.com/hyzhang98/EPAGCL.

Why Does Dropping Edges Usually Outperform Adding Edges in Graph Contrastive Learning?

TL;DR

The paper tackles why dropping edges often outperforms adding edges in graph contrastive learning by introducing Error Passing Rate (EPR), a metric quantifying how well a graph fits a GNN. It shows theoretically that adding edges can increase or decrease EPR depending on class relationships, and it derives conditions under which edge dropping is more stable, ultimately proposing EPAGCL, an EPR-driven adaptive augmentation that combines edge adding/dropping with a feature mask and InfoNCE objective. Extensive experiments across seven real-world datasets demonstrate EPAGCL's improved performance and stability over baselines, while analyzing trade-offs in efficiency and providing ablations to validate augmentation choices. The approach offers a principled, scalable augmentation framework for robust GCL, with practical relevance to node classification tasks in diverse graphs.

Abstract

Graph contrastive learning (GCL) has been widely used as an effective self-supervised learning method for graph representation learning. However, how to apply adequate and stable graph augmentation to generating proper views for contrastive learning remains an essential problem. Dropping edges is a primary augmentation in GCL while adding edges is not a common method due to its unstable performance. To our best knowledge, there is no theoretical analysis to study why dropping edges usually outperforms adding edges. To answer this question, we introduce a new metric, namely Error Passing Rate (EPR), to quantify how a graph fits the network. Inspired by the theoretical conclusions and the idea of positive-incentive noise, we propose a novel GCL algorithm, Error-PAssing-based Graph Contrastive Learning (EPAGCL), which uses both edge adding and edge dropping as its augmentations. To be specific, we generate views by adding and dropping edges based on the weights derived from EPR. Extensive experiments on various real-world datasets are conducted to validate the correctness of our theoretical analysis and the effectiveness of our proposed algorithm. Our code is available at: https://github.com/hyzhang98/EPAGCL.

Paper Structure

This paper contains 41 sections, 44 equations, 4 figures, 7 tables, 2 algorithms.

Figures (4)

  • Figure 1: Performance on WikiCS and CiteSeer. All the hyper-parameters are fixed. Edge adding and edge dropping are the only augmentation employed respectively.
  • Figure 2: Framework of EPAGCL: Before training, the weight of all existing edges and candidate edge for adding is computed according to the graph structure. We then generate two views adaptively based on the weights. Specifically, we add edges to and drop edges from the graph to obtain one view while drop edges only from the graph to obtain another. A random feature mask is then employed. After that, the two views are fed to a shared Graph Neural Network (GNN) with a projection head for representation learning. The model is trained with a contrastive objective.
  • Figure 3: t-SNE embeddings of the raw features and learned embeddings obtained through EPAGCL on Cora and CiteSeer.
  • Figure 4: Performance improvement of five augmentation strategies compared to 'Random Add'.