Table of Contents
Fetching ...

Noisy-Pair Robust Representation Alignment for Positive-Unlabeled Learning

Hengwei Zhao, Zhengzhong Tu, Zhuo Zheng, Wei Wang, Junjue Wang, Rusty Feagin, Wenzhe Jiao

TL;DR

The paper targets the core bottleneck in positive-unlabeled learning: learning discriminative representations under unreliable supervision. It introduces NcPU, a non-contrastive PU framework that combines NoiSNCL, a noisy-pair robust intra-class alignment loss, with PLD, a phantom label disambiguation scheme based on class prototypes and regret-based updates. The authors provide EM-inspired theoretical justification showing how NoiSNCL and PLD mutually reinforce each other, and they demonstrate substantial empirical gains across standard benchmarks and challenging remote-sensing datasets without requiring auxiliary negatives or priors. The results reveal that NcPU closes much of the gap to supervised performance and offers robust, scalable performance for real-world weakly supervised tasks, with broad applicability in areas such as post-disaster building damage mapping. The work suggests a promising direction for non-contrastive, prototype-informed PU learning and broader weak supervision contexts, with code to be released after review.

Abstract

Positive-Unlabeled (PU) learning aims to train a binary classifier (positive vs. negative) where only limited positive data and abundant unlabeled data are available. While widely applicable, state-of-the-art PU learning methods substantially underperform their supervised counterparts on complex datasets, especially without auxiliary negatives or pre-estimated parameters (e.g., a 14.26% gap on CIFAR-100 dataset). We identify the primary bottleneck as the challenge of learning discriminative representations under unreliable supervision. To tackle this challenge, we propose NcPU, a non-contrastive PU learning framework that requires no auxiliary information. NcPU combines a noisy-pair robust supervised non-contrastive loss (NoiSNCL), which aligns intra-class representations despite unreliable supervision, with a phantom label disambiguation (PLD) scheme that supplies conservative negative supervision via regret-based label updates. Theoretically, NoiSNCL and PLD can iteratively benefit each other from the perspective of the Expectation-Maximization framework. Empirically, extensive experiments demonstrate that: (1) NoiSNCL enables simple PU methods to achieve competitive performance; and (2) NcPU achieves substantial improvements over state-of-the-art PU methods across diverse datasets, including challenging datasets on post-disaster building damage mapping, highlighting its promise for real-world applications. Code: Code will be open-sourced after review.

Noisy-Pair Robust Representation Alignment for Positive-Unlabeled Learning

TL;DR

The paper targets the core bottleneck in positive-unlabeled learning: learning discriminative representations under unreliable supervision. It introduces NcPU, a non-contrastive PU framework that combines NoiSNCL, a noisy-pair robust intra-class alignment loss, with PLD, a phantom label disambiguation scheme based on class prototypes and regret-based updates. The authors provide EM-inspired theoretical justification showing how NoiSNCL and PLD mutually reinforce each other, and they demonstrate substantial empirical gains across standard benchmarks and challenging remote-sensing datasets without requiring auxiliary negatives or priors. The results reveal that NcPU closes much of the gap to supervised performance and offers robust, scalable performance for real-world weakly supervised tasks, with broad applicability in areas such as post-disaster building damage mapping. The work suggests a promising direction for non-contrastive, prototype-informed PU learning and broader weak supervision contexts, with code to be released after review.

Abstract

Positive-Unlabeled (PU) learning aims to train a binary classifier (positive vs. negative) where only limited positive data and abundant unlabeled data are available. While widely applicable, state-of-the-art PU learning methods substantially underperform their supervised counterparts on complex datasets, especially without auxiliary negatives or pre-estimated parameters (e.g., a 14.26% gap on CIFAR-100 dataset). We identify the primary bottleneck as the challenge of learning discriminative representations under unreliable supervision. To tackle this challenge, we propose NcPU, a non-contrastive PU learning framework that requires no auxiliary information. NcPU combines a noisy-pair robust supervised non-contrastive loss (NoiSNCL), which aligns intra-class representations despite unreliable supervision, with a phantom label disambiguation (PLD) scheme that supplies conservative negative supervision via regret-based label updates. Theoretically, NoiSNCL and PLD can iteratively benefit each other from the perspective of the Expectation-Maximization framework. Empirically, extensive experiments demonstrate that: (1) NoiSNCL enables simple PU methods to achieve competitive performance; and (2) NcPU achieves substantial improvements over state-of-the-art PU methods across diverse datasets, including challenging datasets on post-disaster building damage mapping, highlighting its promise for real-world applications. Code: Code will be open-sourced after review.

Paper Structure

This paper contains 23 sections, 1 theorem, 39 equations, 13 figures, 9 tables, 1 algorithm.

Key Result

Theorem 1

Assume the distribution of each class in the representation space follows a $d$-variate von Mises-Fisher (vMF) distribution, which leads to: $h(\bm{x}|\tilde{\bm{\nu}}_c,\kappa)=c_d(\kappa)e^{\kappa \tilde{\bm{\nu}}_c^\top \tilde{g}(\bm{x})}$, where $\tilde{\bm{\nu}}_c = \bm{\nu}_c/\left\lVert \bm{\

Figures (13)

  • Figure 1: Illustration of different representation learning methods. Representation learning can acquire discriminative representations either by pulling same-class samples closer to the anchor and pushing different-class samples apart (contrastive representation learning), or by only pulling same-class samples closer to the anchor (non-contrastive representation learning). (a) Self-supervised representation learning: same-class pairs from augmented anchor. (b) Supervised representation learning: same-class pairs from reliable labels. (c) Noisy-pair robust representation learning: same-class pairs from unreliable labels.
  • Figure 2: t-SNE visualizations of the representations learned by PU methods on CIFAR-10 training dataset.
  • Figure 3: The proposed NcPU framework. NoiSNCL improves representations for label disambiguation, while PLD enhances supervision for representation learning.
  • Figure 4: t-SNE visualizations of the representations learned by risk estimation methods on CIFAR-10 training dataset.
  • Figure 5: Analyses of hyperparameters on the CIFAR-10 dataset.
  • ...and 8 more figures

Theorems & Definitions (1)

  • Theorem 1