Bayesian Self-Supervised Contrastive Learning
Bin Liu, Bang Wang, Tianrui Li
TL;DR
This work tackles the challenge of false negatives in self-supervised contrastive learning by introducing Bayesian Self-Supervised Contrastive Learning (BCL), which reweights unlabeled negatives using importance sampling. By designing a target sampling distribution with parameters $\alpha$ (debiasing) and $\beta$ (hard negative mining), BCL computes weights $\omega_i$ that reflect the posterior probability that a sample is a true negative, enabling debiasing and harder negative separation without parametric density assumptions. The authors prove that, with $\beta=0.5$, $\mathcal{L}_{\text{Bcl}}$ is asymptotically unbiased with respect to the supervised loss $\mathcal{L}_{\text{sup}}$, and show finite-$N$ error bounds. Empirically, BCL improves performance over strong baselines like SimCLR, DCL, and HCL on multiple datasets and tasks, while maintaining comparable runtime, and provides practical guidance on hyperparameters and negative sample size. Overall, BCL offers a principled, non-parametric pathway to leverage unlabeled data more effectively in contrastive learning, with potential to enhance downstream representation quality.
Abstract
Recent years have witnessed many successful applications of contrastive learning in diverse domains, yet its self-supervised version still remains many exciting challenges. As the negative samples are drawn from unlabeled datasets, a randomly selected sample may be actually a false negative to an anchor, leading to incorrect encoder training. This paper proposes a new self-supervised contrastive loss called the BCL loss that still uses random samples from the unlabeled data while correcting the resulting bias with importance weights. The key idea is to design the desired sampling distribution for sampling hard true negative samples under the Bayesian framework. The prominent advantage lies in that the desired sampling distribution is a parametric structure, with a location parameter for debiasing false negative and concentration parameter for mining hard negative, respectively. Experiments validate the effectiveness and superiority of the BCL loss.
