Bayesian Self-Supervised Contrastive Learning

Bin Liu; Bang Wang; Tianrui Li

Bayesian Self-Supervised Contrastive Learning

Bin Liu, Bang Wang, Tianrui Li

TL;DR

This work tackles the challenge of false negatives in self-supervised contrastive learning by introducing Bayesian Self-Supervised Contrastive Learning (BCL), which reweights unlabeled negatives using importance sampling. By designing a target sampling distribution with parameters $\alpha$ (debiasing) and $\beta$ (hard negative mining), BCL computes weights $\omega_i$ that reflect the posterior probability that a sample is a true negative, enabling debiasing and harder negative separation without parametric density assumptions. The authors prove that, with $\beta=0.5$, $\mathcal{L}_{\text{Bcl}}$ is asymptotically unbiased with respect to the supervised loss $\mathcal{L}_{\text{sup}}$, and show finite-$N$ error bounds. Empirically, BCL improves performance over strong baselines like SimCLR, DCL, and HCL on multiple datasets and tasks, while maintaining comparable runtime, and provides practical guidance on hyperparameters and negative sample size. Overall, BCL offers a principled, non-parametric pathway to leverage unlabeled data more effectively in contrastive learning, with potential to enhance downstream representation quality.

Abstract

Recent years have witnessed many successful applications of contrastive learning in diverse domains, yet its self-supervised version still remains many exciting challenges. As the negative samples are drawn from unlabeled datasets, a randomly selected sample may be actually a false negative to an anchor, leading to incorrect encoder training. This paper proposes a new self-supervised contrastive loss called the BCL loss that still uses random samples from the unlabeled data while correcting the resulting bias with importance weights. The key idea is to design the desired sampling distribution for sampling hard true negative samples under the Bayesian framework. The prominent advantage lies in that the desired sampling distribution is a parametric structure, with a location parameter for debiasing false negative and concentration parameter for mining hard negative, respectively. Experiments validate the effectiveness and superiority of the BCL loss.

Bayesian Self-Supervised Contrastive Learning

TL;DR

(debiasing) and

(hard negative mining), BCL computes weights

that reflect the posterior probability that a sample is a true negative, enabling debiasing and harder negative separation without parametric density assumptions. The authors prove that, with

is asymptotically unbiased with respect to the supervised loss

, and show finite-

error bounds. Empirically, BCL improves performance over strong baselines like SimCLR, DCL, and HCL on multiple datasets and tasks, while maintaining comparable runtime, and provides practical guidance on hyperparameters and negative sample size. Overall, BCL offers a principled, non-parametric pathway to leverage unlabeled data more effectively in contrastive learning, with potential to enhance downstream representation quality.

Abstract

Paper Structure (31 sections, 8 theorems, 60 equations, 10 figures, 10 tables, 4 algorithms)

This paper contains 31 sections, 8 theorems, 60 equations, 10 figures, 10 tables, 4 algorithms.

Introduction
Related Work
The Proposed Method
False Negative and Hard Negative
Calculation of Importance Weights
Target Sampling Population
Actual Sampling Population
Monte Carlo Importance Sampling
Theoretical Analysis
Experiments
Numerical Experiment
Image Classification
Conclusion
Theoretical Analysis
Sampling Bias Analysis
...and 16 more sections

Key Result

Proposition 3.1

If $\phi(\hat{x})$ is continuous density function that satisfy $\phi(\hat{x}) \geq 0$ and $\int_{-\infty}^{+\infty}\phi(\hat{x}) d\hat{x} =1$, then $\phi_{\textsc{Tn}}(\hat{x})$ and $\phi_{\textsc{Fn}}(\hat{x})$ are probability density functions that satisfy $\phi_{\textsc{Tn}}(\hat{x})\geq 0$, $\ph

Figures (10)

Figure 1: False negative samples and hard negative samples.
Figure 2: Two possible cases for the relative positions of anchor, positive, and negative triples.
Figure 3: $\phi_{\textsc{Tn}}(\hat{x})$ and $\phi_{\textsc{Fn}}(\hat{x})$ with different $\alpha$ settings.
Figure 4: Generation process of observation $\exp(\hat{x}/t)$.
Figure 5: Empirical distribution of $\exp(\hat{x}/t)$ with different $\alpha$.
...and 5 more figures

Theorems & Definitions (16)

Proposition 3.1: Class Conditional Density
proof
Lemma 3.2: Posterior Probability Estimation
proof
Lemma 3.3: Asymptotic Unbiased Estimation
proof
Lemma 3.4: Estimation Error Bound
proof
Proposition 1.1: Class Conditional Density
proof
...and 6 more

Bayesian Self-Supervised Contrastive Learning

TL;DR

Abstract

Bayesian Self-Supervised Contrastive Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (16)