Table of Contents
Fetching ...

Angular Regularization for Positive-Unlabeled Learning on the Hypersphere

Vasileios Sevetlidis, George Pavlidis, Antonios Gasteratos

TL;DR

This work addresses Positive-Unlabeled learning by proposing AngularPU, a geometry-first framework that embeds inputs on the unit hypersphere and uses a learnable positive prototype measured by cosine similarity. The method combines a vMF-inspired directional score with an unlabeled angular uniformity regularizer, enabling end-to-end training without negative sampling or class-prior estimation. Theoretical analysis links the angular decision rule to a vMF–uniform generative model and shows the learnability of the prototype, while empirical results across CIFAR-10, STL-10, SVHN, and ADNI demonstrate competitive or superior performance, especially in scarce-positive and high-dimensional settings. Ablation studies corroborate the necessity of the regularizer and margin components, and the approach offers scalable, interpretable PU learning with robust recall characteristics.

Abstract

Positive-Unlabeled (PU) learning addresses classification problems where only a subset of positive examples is labeled and the remaining data is unlabeled, making explicit negative supervision unavailable. Existing PU methods often rely on negative-risk estimation or pseudo-labeling, which either require strong distributional assumptions or can collapse in high-dimensional settings. We propose AngularPU, a novel PU framework that operates on the unit hypersphere using cosine similarity and angular margin. In our formulation, the positive class is represented by a learnable prototype vector, and classification reduces to thresholding the cosine similarity between an embedding and this prototype-eliminating the need for explicit negative modeling. To counteract the tendency of unlabeled embeddings to cluster near the positive prototype, we introduce an angular regularizer that encourages dispersion of the unlabeled set over the hypersphere, improving separation. We provide theoretical guarantees on the Bayes-optimality of the angular decision rule, consistency of the learned prototype, and the effect of the regularizer on the unlabeled distribution. Experiments on benchmark datasets demonstrate that AngularPU achieves competitive or superior performance compared to state-of-the-art PU methods, particularly in settings with scarce positives and high-dimensional embeddings, while offering geometric interpretability and scalability.

Angular Regularization for Positive-Unlabeled Learning on the Hypersphere

TL;DR

This work addresses Positive-Unlabeled learning by proposing AngularPU, a geometry-first framework that embeds inputs on the unit hypersphere and uses a learnable positive prototype measured by cosine similarity. The method combines a vMF-inspired directional score with an unlabeled angular uniformity regularizer, enabling end-to-end training without negative sampling or class-prior estimation. Theoretical analysis links the angular decision rule to a vMF–uniform generative model and shows the learnability of the prototype, while empirical results across CIFAR-10, STL-10, SVHN, and ADNI demonstrate competitive or superior performance, especially in scarce-positive and high-dimensional settings. Ablation studies corroborate the necessity of the regularizer and margin components, and the approach offers scalable, interpretable PU learning with robust recall characteristics.

Abstract

Positive-Unlabeled (PU) learning addresses classification problems where only a subset of positive examples is labeled and the remaining data is unlabeled, making explicit negative supervision unavailable. Existing PU methods often rely on negative-risk estimation or pseudo-labeling, which either require strong distributional assumptions or can collapse in high-dimensional settings. We propose AngularPU, a novel PU framework that operates on the unit hypersphere using cosine similarity and angular margin. In our formulation, the positive class is represented by a learnable prototype vector, and classification reduces to thresholding the cosine similarity between an embedding and this prototype-eliminating the need for explicit negative modeling. To counteract the tendency of unlabeled embeddings to cluster near the positive prototype, we introduce an angular regularizer that encourages dispersion of the unlabeled set over the hypersphere, improving separation. We provide theoretical guarantees on the Bayes-optimality of the angular decision rule, consistency of the learned prototype, and the effect of the regularizer on the unlabeled distribution. Experiments on benchmark datasets demonstrate that AngularPU achieves competitive or superior performance compared to state-of-the-art PU methods, particularly in settings with scarce positives and high-dimensional embeddings, while offering geometric interpretability and scalability.

Paper Structure

This paper contains 23 sections, 16 theorems, 51 equations, 6 figures, 5 tables.

Key Result

Proposition 4.1

Let $z\in\mathbb S^{d-1}$, with $p(z\mid Y{=}1)=C_d(\kappa)\exp(\kappa\,\mu^\top z)$ and $p(z\mid Y{=}0)=U_d$, and prior $\pi=\Pr(Y{=}1)$. Then the Bayes–optimal classifier is a threshold on the inner product:

Figures (6)

  • Figure 1: Overview of our hyperspherical PU method. A shared encoder $f_\theta$ maps labeled positives $x^p$ and unlabeled samples $x^u$ onto the unit hypersphere. Positives are pulled toward a learnable direction $\mu$ via a cosine-based alignment loss ($\mathcal{L}_{\mathrm{pos}}$). Unlabeled samples are trained with a symmetric BCE loss ($\mathcal{L}_{\mathrm{unlab}}$) and dispersed via an angular uniformity regularizer ($\mathcal{L}_{\mathrm{reg}}$). Right: regularization increases angular separation, yielding compact positives and spread-out unlabeled embeddings.
  • Figure 2: Sensitivity to $\lambda$ (cosine regularization weight). Mean $\pm$ std F1 and AUC across 5 seeds.
  • Figure 3: Sensitivity to margin $m$ (fixed angular threshold). Mean $\pm$ std F1 and AUC across 5 seeds.
  • Figure 4: Sensitivity to $\kappa$ (vMF concentration). Mean $\pm$ std F1 and AUC across 5 seeds.
  • Figure 5: Primary configuration (weighted + learnable margin): per-dataset AP comparison between cosine and Euclidean. Cosine is superior on all datasets.
  • ...and 1 more figures

Theorems & Definitions (35)

  • Proposition 4.1: MAP rule for vMF–uniform
  • proof : Proof sketch
  • Remark 1: Isotropic negatives
  • Lemma 4.1: Consistency of the Positive Prototype
  • proof
  • Proposition 4.2: Dispersion via Cosine Regularization
  • proof
  • Corollary 4.1: Upper Bound on Regularization Term
  • proof
  • Theorem A.1: Bayes-optimal decision rule
  • ...and 25 more