Table of Contents
Fetching ...

Understanding Contrastive Representation Learning from Positive Unlabeled (PU) Data

Anish Acharya, Li Jing, Bhargav Bhushanam, Dhruv Choudhary, Michael Rabbat, Sujay Sanghavi, Inderjit S Dhillon

TL;DR

The paper tackles learning representations from Positive-Unlabeled data by adapting contrastive learning to the PU setting. It introduces puCL, a bias-free, variance-reducing variant that leverages labeled positives while using unlabeled data only through self-supervision, and puNCE, which incorporates class-prior information as soft, probabilistic mixtures. To bridge to downstream tasks, it proposes puPL, a PU-aware clustering-based pseudo-labeling approach with provable guarantees, enabling effective linear classification without labeled negatives. The framework is supported by a bias-variance analysis, convergence and generalization guarantees, and extensive experiments showing consistent gains over existing PU methods, especially in low-supervision regimes. Overall, the work provides a cohesive, theoretically grounded pipeline for PU contrastive learning with strong empirical performance and practical scalability.

Abstract

Pretext Invariant Representation Learning (PIRL) followed by Supervised Fine-Tuning (SFT) has become a standard paradigm for learning with limited labels. We extend this approach to the Positive Unlabeled (PU) setting, where only a small set of labeled positives and a large unlabeled pool -- containing both positives and negatives are available. We study this problem under two regimes: (i) without access to the class prior, and (ii) when the prior is known or can be estimated. We introduce Positive Unlabeled Contrastive Learning (puCL), an unbiased and variance reducing contrastive objective that integrates weak supervision from labeled positives judiciously into the contrastive loss. When the class prior is known, we propose Positive Unlabeled InfoNCE (puNCE), a prior-aware extension that re-weights unlabeled samples as soft positive negative mixtures. For downstream classification, we develop a pseudo-labeling algorithm that leverages the structure of the learned embedding space via PU aware clustering. Our framework is supported by theory; offering bias-variance analysis, convergence insights, and generalization guarantees via augmentation concentration; and validated empirically across standard PU benchmarks, where it consistently outperforms existing methods, particularly in low-supervision regimes.

Understanding Contrastive Representation Learning from Positive Unlabeled (PU) Data

TL;DR

The paper tackles learning representations from Positive-Unlabeled data by adapting contrastive learning to the PU setting. It introduces puCL, a bias-free, variance-reducing variant that leverages labeled positives while using unlabeled data only through self-supervision, and puNCE, which incorporates class-prior information as soft, probabilistic mixtures. To bridge to downstream tasks, it proposes puPL, a PU-aware clustering-based pseudo-labeling approach with provable guarantees, enabling effective linear classification without labeled negatives. The framework is supported by a bias-variance analysis, convergence and generalization guarantees, and extensive experiments showing consistent gains over existing PU methods, especially in low-supervision regimes. Overall, the work provides a cohesive, theoretically grounded pipeline for PU contrastive learning with strong empirical performance and practical scalability.

Abstract

Pretext Invariant Representation Learning (PIRL) followed by Supervised Fine-Tuning (SFT) has become a standard paradigm for learning with limited labels. We extend this approach to the Positive Unlabeled (PU) setting, where only a small set of labeled positives and a large unlabeled pool -- containing both positives and negatives are available. We study this problem under two regimes: (i) without access to the class prior, and (ii) when the prior is known or can be estimated. We introduce Positive Unlabeled Contrastive Learning (puCL), an unbiased and variance reducing contrastive objective that integrates weak supervision from labeled positives judiciously into the contrastive loss. When the class prior is known, we propose Positive Unlabeled InfoNCE (puNCE), a prior-aware extension that re-weights unlabeled samples as soft positive negative mixtures. For downstream classification, we develop a pseudo-labeling algorithm that leverages the structure of the learned embedding space via PU aware clustering. Our framework is supported by theory; offering bias-variance analysis, convergence insights, and generalization guarantees via augmentation concentration; and validated empirically across standard PU benchmarks, where it consistently outperforms existing methods, particularly in low-supervision regimes.
Paper Structure (44 sections, 11 theorems, 121 equations, 14 figures, 4 tables)

This paper contains 44 sections, 11 theorems, 121 equations, 14 figures, 4 tables.

Key Result

Lemma 1

Consider learning a binary classifier (P vs N) in presence of class-dependent label noise with noise rates $E(\xi_\textsc{P}) = \frac{\pi}{\gamma + \pi},\; \xi_\textsc{N} = 0$. Without additional distributional assumption, no robust estimator can guarantee bounded risk estimate if: where $\gamma = \frac{n_\textsc{P}}{n_\textsc{U}}$ and $\pi = p(y=1|{\mathbf{x}})$ denotes the underlying class pri

Figures (14)

  • Figure 1: Positive Unlabeled Learning . No negative examples are labeled, a binary classifier needs to be trained using a set of labeled positives $\sim p_\textsc{P}({\textnormal{x}})$ and a set of unlabeled samples drawn from $\sim p_\textsc{U}({\textnormal{x}}) = \pi_\textsc{P}p_\textsc{P}({\textnormal{x}})+ (1 - \pi_\textsc{P})p_\textsc{N}({\textnormal{x}})$ -- the mixture distribution of the positive and negative (unobserved) class marginals .
  • Figure 2: (Ablations over Varying $\kappa$) ResNet-34 trained on ImageNet-I (a) Variation of $\kappa$ w.r.t class prior ($\pi_p$) and PU supervision ($\gamma$) (b) Generalization performance of contrastive objectives with varying $\kappa$. (c) 2D visualization of (b) across each loss.
  • Figure 3: Mixed Contrastive Learning ResNet-18 trained on CIFAR-III (vehicle vs animal). (a) Variation of $\kappa$ w.r.t class prior ($\pi_p$) and PU supervision ($\gamma$) (b) Generalization performance of contrastive objectives with varying $\kappa$.
  • Figure 4: Embedding Quality vs. Supervision Ratio ($\gamma$). We visualize the learned feature embeddings (t-SNE) from a ResNet-18 trained on the ImageNet-II dataset using different contrastive learning methods. The supervision ratio $\gamma = n_{\textsc{P}} / n_{\textsc{U}}$ controls the proportion of labeled positives, while the total number of training samples $N = n_{\textsc{P}} + n_{\textsc{U}}$ is held fixed. Compared to the unsupervised baseline ssCL, our proposed puCL yields substantially improved class separability, which improves consistently with increasing $\gamma$. This highlights the benefit of incorporating even limited supervision. The fully supervised sCL serves as an upper bound in terms of embedding structure with similar training hyper-parameters.
  • Figure 5: Convergence: Training ResNet-18 on (a) CIFAR-0 (b) ImageNet-II. Clearly, by incorporating more labeled positives puCL enjoys convergence speedup over ssCL.
  • ...and 9 more figures

Theorems & Definitions (18)

  • Definition 1: Breakdown point
  • Lemma 1
  • Definition 2: Invariance under Transformation
  • Theorem 1
  • Lemma 2
  • Definition 3: Clustering
  • Definition 4: Potential Function
  • Theorem 2: Clustering Quality of puPL
  • Definition 5: ($\sigma, \delta$) Augmentation
  • Definition 6: Augmentation Distance
  • ...and 8 more