Contrastive Learning with Negative Sampling Correction

Lu Wang; Chao Du; Pu Zhao; Chuan Luo; Zhangchi Zhu; Bo Qiao; Wei Zhang; Qingwei Lin; Saravan Rajmohan; Dongmei Zhang; Qi Zhang

Contrastive Learning with Negative Sampling Correction

Lu Wang, Chao Du, Pu Zhao, Chuan Luo, Zhangchi Zhu, Bo Qiao, Wei Zhang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

TL;DR

The paper tackles negative sampling bias in contrastive learning by recasting negatives as unlabeled data within a Positive-Unlabeled Learning (PU) framework. It derives a debiased contrastive loss, DeCL, by expressing the negative distribution as a mixture of unlabeled and positive components under the single-training-set and Selected Completely At Random (SCAR) assumptions, and demonstrates that the loss difference to the ideal unbiased loss vanishes as the negative sample size grows: $|\mathcal{L}_{IdealCL}-\mathcal{L}_{DeCL}| \le \tfrac{1}{2\sqrt{N}}(e^{2}-1)$. Empirically, PUCL consistently improves image and graph representation tasks across SimCLR, CMC, MoCo, and InfoGraph baselines, with substantial gains in several settings and robustness to hyperparameters α and c. The approach provides a principled, broadly applicable correction for negative sampling bias that can be integrated with existing CL frameworks, enhancing downstream classification performance on diverse data modalities.

Abstract

As one of the most effective self-supervised representation learning methods, contrastive learning (CL) relies on multiple negative pairs to contrast against each positive pair. In the standard practice of contrastive learning, data augmentation methods are utilized to generate both positive and negative pairs. While existing works have been focusing on improving the positive sampling, the negative sampling process is often overlooked. In fact, the generated negative samples are often polluted by positive samples, which leads to a biased loss and performance degradation. To correct the negative sampling bias, we propose a novel contrastive learning method named Positive-Unlabeled Contrastive Learning (PUCL). PUCL treats the generated negative samples as unlabeled samples and uses information from positive samples to correct bias in contrastive loss. We prove that the corrected loss used in PUCL only incurs a negligible bias compared to the unbiased contrastive loss. PUCL can be applied to general contrastive learning problems and outperforms state-of-the-art methods on various image and graph classification tasks. The code of PUCL is in the supplementary file.

Contrastive Learning with Negative Sampling Correction

TL;DR

. Empirically, PUCL consistently improves image and graph representation tasks across SimCLR, CMC, MoCo, and InfoGraph baselines, with substantial gains in several settings and robustness to hyperparameters α and c. The approach provides a principled, broadly applicable correction for negative sampling bias that can be integrated with existing CL frameworks, enhancing downstream classification performance on diverse data modalities.

Abstract

Paper Structure (21 sections, 2 theorems, 9 equations, 3 figures, 2 tables)

This paper contains 21 sections, 2 theorems, 9 equations, 3 figures, 2 tables.

Introduction
Related Work
Contrastive Representation Learning
Positive-Unlabeled Learning
Preliminary
Modeling Positive and Unlabeled Data
Contrastive Learning in Positive-Unlabeled Learning Framework
Negative Sampling Bias in Contrastive Loss
Method
Representation of Negative Distribution
Correcting Contrastive Loss with Positive and Unlabeled Data
Experiments
Experimental Settings
Image Representation Learning
Results for Baseline SimCLR ($N$=$510$)
...and 6 more sections

Key Result

Lemma 1

Under the single-training-set scenario and the SCAR assumption, the negative sample distribution can be represented by the unlabeled and positive distributions as following:

Figures (3)

Figure 1: An illustration on the generation of positive and negative/unlabeled data in common contrastive learning. Treating unlabeled data as negatives can lead to biased loss when some positive samples are mislabeled (the dog on the right). While unlabeled data $x^u$ are sampled from partial data of $p$, a debiased loss can be formed with the correct treatment of the unlabeled samples.
Figure 2: Embedding visualization.
Figure 3: Classification accuracy with (a) different unlabeled sample size $M^u$ and $\alpha$, (b) different $\alpha$ and $c$. Embeddings are trained using SimCLR + PUCL and evaluated on CIFAR10 and STL10. For (a), we trained with different batch size (which gives different unlabeled sample size in each step) and $\alpha$, keep $c$ = 0.01. For (b), we trained with different $\alpha$ and $c$.

Theorems & Definitions (2)

Lemma 1
Theorem 1

Contrastive Learning with Negative Sampling Correction

TL;DR

Abstract

Contrastive Learning with Negative Sampling Correction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (2)