Understanding Contrastive Representation Learning from Positive Unlabeled (PU) Data

Anish Acharya; Li Jing; Bhargav Bhushanam; Dhruv Choudhary; Michael Rabbat; Sujay Sanghavi; Inderjit S Dhillon

Understanding Contrastive Representation Learning from Positive Unlabeled (PU) Data

Anish Acharya, Li Jing, Bhargav Bhushanam, Dhruv Choudhary, Michael Rabbat, Sujay Sanghavi, Inderjit S Dhillon

TL;DR

The paper tackles learning representations from Positive-Unlabeled data by adapting contrastive learning to the PU setting. It introduces puCL, a bias-free, variance-reducing variant that leverages labeled positives while using unlabeled data only through self-supervision, and puNCE, which incorporates class-prior information as soft, probabilistic mixtures. To bridge to downstream tasks, it proposes puPL, a PU-aware clustering-based pseudo-labeling approach with provable guarantees, enabling effective linear classification without labeled negatives. The framework is supported by a bias-variance analysis, convergence and generalization guarantees, and extensive experiments showing consistent gains over existing PU methods, especially in low-supervision regimes. Overall, the work provides a cohesive, theoretically grounded pipeline for PU contrastive learning with strong empirical performance and practical scalability.

Abstract

Pretext Invariant Representation Learning (PIRL) followed by Supervised Fine-Tuning (SFT) has become a standard paradigm for learning with limited labels. We extend this approach to the Positive Unlabeled (PU) setting, where only a small set of labeled positives and a large unlabeled pool -- containing both positives and negatives are available. We study this problem under two regimes: (i) without access to the class prior, and (ii) when the prior is known or can be estimated. We introduce Positive Unlabeled Contrastive Learning (puCL), an unbiased and variance reducing contrastive objective that integrates weak supervision from labeled positives judiciously into the contrastive loss. When the class prior is known, we propose Positive Unlabeled InfoNCE (puNCE), a prior-aware extension that re-weights unlabeled samples as soft positive negative mixtures. For downstream classification, we develop a pseudo-labeling algorithm that leverages the structure of the learned embedding space via PU aware clustering. Our framework is supported by theory; offering bias-variance analysis, convergence insights, and generalization guarantees via augmentation concentration; and validated empirically across standard PU benchmarks, where it consistently outperforms existing methods, particularly in low-supervision regimes.

Understanding Contrastive Representation Learning from Positive Unlabeled (PU) Data

TL;DR

Abstract

Paper Structure (44 sections, 11 theorems, 121 equations, 14 figures, 4 tables)

This paper contains 44 sections, 11 theorems, 121 equations, 14 figures, 4 tables.

Introduction
Overview: Contrastive Approach to PU Learning
Contributions
Related Work
Problem Setup
Background
Reduction of PU Learning to Learning with Label Noise
Cost Sensitive PU Learning
Limitations of Cost Sensitive Approaches
Class Prior Estimate.
Low Supervision Regime.
Contrastive Representation Learning from PU Data
Self Supervised Contrastive Learning (ssCL)
Projection Network.
Supervised Contrastive Learning (sCL)
...and 29 more sections

Key Result

Lemma 1

Consider learning a binary classifier (P vs N) in presence of class-dependent label noise with noise rates $E(\xi_\textsc{P}) = \frac{\pi}{\gamma + \pi},\; \xi_\textsc{N} = 0$. Without additional distributional assumption, no robust estimator can guarantee bounded risk estimate if: where $\gamma = \frac{n_\textsc{P}}{n_\textsc{U}}$ and $\pi = p(y=1|{\mathbf{x}})$ denotes the underlying class pri

Figures (14)

Figure 1: Positive Unlabeled Learning . No negative examples are labeled, a binary classifier needs to be trained using a set of labeled positives $\sim p_\textsc{P}({\textnormal{x}})$ and a set of unlabeled samples drawn from $\sim p_\textsc{U}({\textnormal{x}}) = \pi_\textsc{P}p_\textsc{P}({\textnormal{x}})+ (1 - \pi_\textsc{P})p_\textsc{N}({\textnormal{x}})$ -- the mixture distribution of the positive and negative (unobserved) class marginals .
Figure 2: (Ablations over Varying $\kappa$) ResNet-34 trained on ImageNet-I (a) Variation of $\kappa$ w.r.t class prior ($\pi_p$) and PU supervision ($\gamma$) (b) Generalization performance of contrastive objectives with varying $\kappa$. (c) 2D visualization of (b) across each loss.
Figure 3: Mixed Contrastive Learning ResNet-18 trained on CIFAR-III (vehicle vs animal). (a) Variation of $\kappa$ w.r.t class prior ($\pi_p$) and PU supervision ($\gamma$) (b) Generalization performance of contrastive objectives with varying $\kappa$.
Figure 4: Embedding Quality vs. Supervision Ratio ($\gamma$). We visualize the learned feature embeddings (t-SNE) from a ResNet-18 trained on the ImageNet-II dataset using different contrastive learning methods. The supervision ratio $\gamma = n_{\textsc{P}} / n_{\textsc{U}}$ controls the proportion of labeled positives, while the total number of training samples $N = n_{\textsc{P}} + n_{\textsc{U}}$ is held fixed. Compared to the unsupervised baseline ssCL, our proposed puCL yields substantially improved class separability, which improves consistently with increasing $\gamma$. This highlights the benefit of incorporating even limited supervision. The fully supervised sCL serves as an upper bound in terms of embedding structure with similar training hyper-parameters.
Figure 5: Convergence: Training ResNet-18 on (a) CIFAR-0 (b) ImageNet-II. Clearly, by incorporating more labeled positives puCL enjoys convergence speedup over ssCL.
...and 9 more figures

Theorems & Definitions (18)

Definition 1: Breakdown point
Lemma 1
Definition 2: Invariance under Transformation
Theorem 1
Lemma 2
Definition 3: Clustering
Definition 4: Potential Function
Theorem 2: Clustering Quality of puPL
Definition 5: ($\sigma, \delta$) Augmentation
Definition 6: Augmentation Distance
...and 8 more

Understanding Contrastive Representation Learning from Positive Unlabeled (PU) Data

TL;DR

Abstract

Understanding Contrastive Representation Learning from Positive Unlabeled (PU) Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (18)