Table of Contents
Fetching ...

Bayesian Self-Supervised Contrastive Learning

Bin Liu, Bang Wang, Tianrui Li

TL;DR

This work tackles the challenge of false negatives in self-supervised contrastive learning by introducing Bayesian Self-Supervised Contrastive Learning (BCL), which reweights unlabeled negatives using importance sampling. By designing a target sampling distribution with parameters $\alpha$ (debiasing) and $\beta$ (hard negative mining), BCL computes weights $\omega_i$ that reflect the posterior probability that a sample is a true negative, enabling debiasing and harder negative separation without parametric density assumptions. The authors prove that, with $\beta=0.5$, $\mathcal{L}_{\text{Bcl}}$ is asymptotically unbiased with respect to the supervised loss $\mathcal{L}_{\text{sup}}$, and show finite-$N$ error bounds. Empirically, BCL improves performance over strong baselines like SimCLR, DCL, and HCL on multiple datasets and tasks, while maintaining comparable runtime, and provides practical guidance on hyperparameters and negative sample size. Overall, BCL offers a principled, non-parametric pathway to leverage unlabeled data more effectively in contrastive learning, with potential to enhance downstream representation quality.

Abstract

Recent years have witnessed many successful applications of contrastive learning in diverse domains, yet its self-supervised version still remains many exciting challenges. As the negative samples are drawn from unlabeled datasets, a randomly selected sample may be actually a false negative to an anchor, leading to incorrect encoder training. This paper proposes a new self-supervised contrastive loss called the BCL loss that still uses random samples from the unlabeled data while correcting the resulting bias with importance weights. The key idea is to design the desired sampling distribution for sampling hard true negative samples under the Bayesian framework. The prominent advantage lies in that the desired sampling distribution is a parametric structure, with a location parameter for debiasing false negative and concentration parameter for mining hard negative, respectively. Experiments validate the effectiveness and superiority of the BCL loss.

Bayesian Self-Supervised Contrastive Learning

TL;DR

This work tackles the challenge of false negatives in self-supervised contrastive learning by introducing Bayesian Self-Supervised Contrastive Learning (BCL), which reweights unlabeled negatives using importance sampling. By designing a target sampling distribution with parameters (debiasing) and (hard negative mining), BCL computes weights that reflect the posterior probability that a sample is a true negative, enabling debiasing and harder negative separation without parametric density assumptions. The authors prove that, with , is asymptotically unbiased with respect to the supervised loss , and show finite- error bounds. Empirically, BCL improves performance over strong baselines like SimCLR, DCL, and HCL on multiple datasets and tasks, while maintaining comparable runtime, and provides practical guidance on hyperparameters and negative sample size. Overall, BCL offers a principled, non-parametric pathway to leverage unlabeled data more effectively in contrastive learning, with potential to enhance downstream representation quality.

Abstract

Recent years have witnessed many successful applications of contrastive learning in diverse domains, yet its self-supervised version still remains many exciting challenges. As the negative samples are drawn from unlabeled datasets, a randomly selected sample may be actually a false negative to an anchor, leading to incorrect encoder training. This paper proposes a new self-supervised contrastive loss called the BCL loss that still uses random samples from the unlabeled data while correcting the resulting bias with importance weights. The key idea is to design the desired sampling distribution for sampling hard true negative samples under the Bayesian framework. The prominent advantage lies in that the desired sampling distribution is a parametric structure, with a location parameter for debiasing false negative and concentration parameter for mining hard negative, respectively. Experiments validate the effectiveness and superiority of the BCL loss.
Paper Structure (31 sections, 8 theorems, 60 equations, 10 figures, 10 tables, 4 algorithms)

This paper contains 31 sections, 8 theorems, 60 equations, 10 figures, 10 tables, 4 algorithms.

Key Result

Proposition 3.1

If $\phi(\hat{x})$ is continuous density function that satisfy $\phi(\hat{x}) \geq 0$ and $\int_{-\infty}^{+\infty}\phi(\hat{x}) d\hat{x} =1$, then $\phi_{\textsc{Tn}}(\hat{x})$ and $\phi_{\textsc{Fn}}(\hat{x})$ are probability density functions that satisfy $\phi_{\textsc{Tn}}(\hat{x})\geq 0$, $\ph

Figures (10)

  • Figure 1: False negative samples and hard negative samples.
  • Figure 2: Two possible cases for the relative positions of anchor, positive, and negative triples.
  • Figure 3: $\phi_{\textsc{Tn}}(\hat{x})$ and $\phi_{\textsc{Fn}}(\hat{x})$ with different $\alpha$ settings.
  • Figure 4: Generation process of observation $\exp(\hat{x}/t)$.
  • Figure 5: Empirical distribution of $\exp(\hat{x}/t)$ with different $\alpha$.
  • ...and 5 more figures

Theorems & Definitions (16)

  • Proposition 3.1: Class Conditional Density
  • proof
  • Lemma 3.2: Posterior Probability Estimation
  • proof
  • Lemma 3.3: Asymptotic Unbiased Estimation
  • proof
  • Lemma 3.4: Estimation Error Bound
  • proof
  • Proposition 1.1: Class Conditional Density
  • proof
  • ...and 6 more