Table of Contents
Fetching ...

Does GCL Need a Large Number of Negative Samples? Enhancing Graph Contrastive Learning with Effective and Efficient Negative Sampling

Yongqi Huang, Jitao Zhao, Dongxiao He, Di Jin, Yuxiao Huang, Zhen Wang

TL;DR

This work questions the widely held belief that more negative samples always improve Graph Contrastive Learning (GCL). The authors show that, due to topological coupling, a large pool of negatives can actually impair semantic discrimination and demonstrate that a small, carefully selected set of high-quality negatives suffices. They introduce E2Neg, which uses spectral clustering-based centrality sampling and topology reconstruction to create a few representative negatives and localized subgraphs, enabling efficient, center-focused contrastive training. Across six datasets, E2Neg achieves competitive or superior node classification performance while delivering orders-of-magnitude improvements in memory and speed, highlighting a practical path to scalable, effective GCL on large graphs.

Abstract

Graph Contrastive Learning (GCL) aims to self-supervised learn low-dimensional graph representations, primarily through instance discrimination, which involves manually mining positive and negative pairs from graphs, increasing the similarity of positive pairs while decreasing negative pairs. Drawing from the success of Contrastive Learning (CL) in other domains, a consensus has been reached that the effectiveness of GCLs depends on a large number of negative pairs. As a result, despite the significant computational overhead, GCLs typically leverage as many negative node pairs as possible to improve model performance. However, given that nodes within a graph are interconnected, we argue that nodes cannot be treated as independent instances. Therefore, we challenge this consensus: Does employing more negative nodes lead to a more effective GCL model? To answer this, we explore the role of negative nodes in the commonly used InfoNCE loss for GCL and observe that: (1) Counterintuitively, a large number of negative nodes can actually hinder the model's ability to distinguish nodes with different semantics. (2) A smaller number of high-quality and non-topologically coupled negative nodes are sufficient to enhance the discriminability of representations. Based on these findings, we propose a new method called GCL with Effective and Efficient Negative samples, E2Neg, which learns discriminative representations using only a very small set of representative negative samples. E2Neg significantly reduces computational overhead and speeds up model training. We demonstrate the effectiveness and efficiency of E2Neg across multiple datasets compared to other GCL methods.

Does GCL Need a Large Number of Negative Samples? Enhancing Graph Contrastive Learning with Effective and Efficient Negative Sampling

TL;DR

This work questions the widely held belief that more negative samples always improve Graph Contrastive Learning (GCL). The authors show that, due to topological coupling, a large pool of negatives can actually impair semantic discrimination and demonstrate that a small, carefully selected set of high-quality negatives suffices. They introduce E2Neg, which uses spectral clustering-based centrality sampling and topology reconstruction to create a few representative negatives and localized subgraphs, enabling efficient, center-focused contrastive training. Across six datasets, E2Neg achieves competitive or superior node classification performance while delivering orders-of-magnitude improvements in memory and speed, highlighting a practical path to scalable, effective GCL on large graphs.

Abstract

Graph Contrastive Learning (GCL) aims to self-supervised learn low-dimensional graph representations, primarily through instance discrimination, which involves manually mining positive and negative pairs from graphs, increasing the similarity of positive pairs while decreasing negative pairs. Drawing from the success of Contrastive Learning (CL) in other domains, a consensus has been reached that the effectiveness of GCLs depends on a large number of negative pairs. As a result, despite the significant computational overhead, GCLs typically leverage as many negative node pairs as possible to improve model performance. However, given that nodes within a graph are interconnected, we argue that nodes cannot be treated as independent instances. Therefore, we challenge this consensus: Does employing more negative nodes lead to a more effective GCL model? To answer this, we explore the role of negative nodes in the commonly used InfoNCE loss for GCL and observe that: (1) Counterintuitively, a large number of negative nodes can actually hinder the model's ability to distinguish nodes with different semantics. (2) A smaller number of high-quality and non-topologically coupled negative nodes are sufficient to enhance the discriminability of representations. Based on these findings, we propose a new method called GCL with Effective and Efficient Negative samples, E2Neg, which learns discriminative representations using only a very small set of representative negative samples. E2Neg significantly reduces computational overhead and speeds up model training. We demonstrate the effectiveness and efficiency of E2Neg across multiple datasets compared to other GCL methods.

Paper Structure

This paper contains 26 sections, 2 theorems, 15 equations, 2 figures, 4 tables.

Key Result

Theorem 1

Assume that a graph contains $k$ semantic blocks $\mathcal{S}=\{\mathcal{S}_{1}, \cdots, \mathcal{S}_{k} \}$, corresponding to a set of core semantics $s = \{s_1, \dots, s_k\}$, where each $s_j$ is the core semantic associated with the block $\mathcal{S}_j$. If an anchor node $v_i$ belongs to the se where $s_j$ is the core semantic of the block $\mathcal{S}_j$, and $\epsilon_i$ represents the indi

Figures (2)

  • Figure 1: Given a graph $\mathcal{G}$, we use a centrality sampling strategy to select a representative set of nodes from $\mathcal{G}$. Next, we perform topological reconstruction based on the neighbors of these nodes to generate a reconstructed graph $\mathcal{\hat{G}}$. The function $Aug(\cdot)$ is a custom augmentation used to create augmented graph $\mathcal{\tilde{G}}$. $\mathcal{\hat{G}}$ and $\mathcal{\tilde{G}}$ are then input into the encoder $f$ to generate representations $\bm{\hat{H}}$ and $\bm{\tilde{H}}$. Finally, we select the representative node set to compute the loss.
  • Figure 2: Hyperparameter Analysis, where * denotes the parameter used by E2Neg, and N represents the number of nodes in each dataset.

Theorems & Definitions (4)

  • Definition 1: Receptive Field $\mathcal{R}$
  • Definition 2: Semantic Block $\mathcal{S}$
  • Theorem 1: Semantic Block-Based Decomposition
  • Theorem 2: Sample Threshold Gradient Boundary