Table of Contents
Fetching ...

Graph-based Semi-Supervised Learning via Maximum Discrimination

Nadav Katz, Ariel Jaffe

TL;DR

This work tackles semi-supervised learning with scarce labels by introducing AUC-spec, a graph-based method that optimizes a discriminative objective combining graph smoothness and AUC-based separation between labeled classes. The core idea is to replace strict label-value enforcement with an optimization that maximizes the ranking of positive over negative labeled samples, implemented via a differentiable sigmoid surrogate and iterative updates on the random-walk graph operator. The authors provide theoretical analysis under product-of-manifold models, showing smoothness of the solution and polynomially bounded labeled-sample requirements, and demonstrate empirical competitiveness with state-of-the-art SSL methods while offering favorable runtime profiles. The approach yields robust separation in settings where traditional LP-based methods misalign with class boundaries and shows promise for scalable, discrimination-aware SSL on real-world graphs. Key implications include improved performance with limited labels and a framework adaptable to multi-class problems and additional discrimination-oriented objectives.

Abstract

Semi-supervised learning (SSL) addresses the critical challenge of training accurate models when labeled data is scarce but unlabeled data is abundant. Graph-based SSL (GSSL) has emerged as a popular framework that captures data structure through graph representations. Classic graph SSL methods, such as Label Propagation and Label Spreading, aim to compute low-dimensional representations where points with the same labels are close in representation space. Although often effective, these methods can be suboptimal on data with complex label distributions. In our work, we develop AUC-spec, a graph approach that computes a low-dimensional representation that maximizes class separation. We compute this representation by optimizing the Area Under the ROC Curve (AUC) as estimated via the labeled points. We provide a detailed analysis of our approach under a product-of-manifold model, and show that the required number of labeled points for AUC-spec is polynomial in the model parameters. Empirically, we show that AUC-spec balances class separation with graph smoothness. It demonstrates competitive results on synthetic and real-world datasets while maintaining computational efficiency comparable to the field's classic and state-of-the-art methods.

Graph-based Semi-Supervised Learning via Maximum Discrimination

TL;DR

This work tackles semi-supervised learning with scarce labels by introducing AUC-spec, a graph-based method that optimizes a discriminative objective combining graph smoothness and AUC-based separation between labeled classes. The core idea is to replace strict label-value enforcement with an optimization that maximizes the ranking of positive over negative labeled samples, implemented via a differentiable sigmoid surrogate and iterative updates on the random-walk graph operator. The authors provide theoretical analysis under product-of-manifold models, showing smoothness of the solution and polynomially bounded labeled-sample requirements, and demonstrate empirical competitiveness with state-of-the-art SSL methods while offering favorable runtime profiles. The approach yields robust separation in settings where traditional LP-based methods misalign with class boundaries and shows promise for scalable, discrimination-aware SSL on real-world graphs. Key implications include improved performance with limited labels and a framework adaptable to multi-class problems and additional discrimination-oriented objectives.

Abstract

Semi-supervised learning (SSL) addresses the critical challenge of training accurate models when labeled data is scarce but unlabeled data is abundant. Graph-based SSL (GSSL) has emerged as a popular framework that captures data structure through graph representations. Classic graph SSL methods, such as Label Propagation and Label Spreading, aim to compute low-dimensional representations where points with the same labels are close in representation space. Although often effective, these methods can be suboptimal on data with complex label distributions. In our work, we develop AUC-spec, a graph approach that computes a low-dimensional representation that maximizes class separation. We compute this representation by optimizing the Area Under the ROC Curve (AUC) as estimated via the labeled points. We provide a detailed analysis of our approach under a product-of-manifold model, and show that the required number of labeled points for AUC-spec is polynomial in the model parameters. Empirically, we show that AUC-spec balances class separation with graph smoothness. It demonstrates competitive results on synthetic and real-world datasets while maintaining computational efficiency comparable to the field's classic and state-of-the-art methods.
Paper Structure (21 sections, 3 theorems, 19 equations, 3 figures, 8 tables, 2 algorithms)

This paper contains 21 sections, 3 theorems, 19 equations, 3 figures, 8 tables, 2 algorithms.

Key Result

Lemma 4.1

Let $\hat{\mathbf{v}}$ denote the minimizer of objective Eq. eq:objective. Then $\hat{\mathbf{v}}^T L \hat{\mathbf{v}} \leq \lambda_K$ .

Figures (3)

  • Figure 1: Ring of Gaussians dataset, colored by class value with labeled points in hand colored by red
  • Figure 2: A Gaussian mixture model: (Left) Points are colored according to their class. Labeled points are colored in red. (Middle and right): points colored according to the outcome of label propagation (middle) and AUC-spec (right).
  • Figure 3: Points sampled over a rectangle of size $[0,1] \times [0,\alpha]$. The labels are set according to the second coordinate. The results show the AUC score, over the unlabeled points, of the vector computed by AUC-spec.

Theorems & Definitions (5)

  • Lemma 4.1
  • proof
  • Theorem 4.2
  • Lemma A.1
  • proof