Table of Contents
Fetching ...

Semi-Supervised Laplace Learning on Stiefel Manifolds

Chester Holtz, Pengwen Chen, Alexander Cloninger, Chung-Kuan Cheng, Gal Mishne

TL;DR

This work addresses semi-supervised graph learning when labels are extremely scarce, a regime where classical Laplacian-based methods degenerate. It reframes SSL as a nonconvex QCQP over the compact Stiefel Manifold and develops a scalable pipeline combining an Orthogonal Procrustes initialization with a Sequential Subspace Method (SSM) and cut-based refinement to robustly propagate labels. An active-learning score derived from the grounded Laplacian and absorbing random-walk insights guides diverse sample selection, improving performance at low label rates. Empirical results on MNIST, Fashion-MNIST, and CIFAR-10 show state-of-the-art accuracy across low to high label regimes, highlighting both the predictive power and practical efficiency of the proposed SSM framework and its active-learning extension.

Abstract

Motivated by the need to address the degeneracy of canonical Laplace learning algorithms in low label rates, we propose to reformulate graph-based semi-supervised learning as a nonconvex generalization of a \emph{Trust-Region Subproblem} (TRS). This reformulation is motivated by the well-posedness of Laplacian eigenvectors in the limit of infinite unlabeled data. To solve this problem, we first show that a first-order condition implies the solution of a manifold alignment problem and that solutions to the classical \emph{Orthogonal Procrustes} problem can be used to efficiently find good classifiers that are amenable to further refinement. To tackle refinement, we develop the framework of Sequential Subspace Optimization for graph-based SSL. Next, we address the criticality of selecting supervised samples at low-label rates. We characterize informative samples with a novel measure of centrality derived from the principal eigenvectors of a certain submatrix of the graph Laplacian. We demonstrate that our framework achieves lower classification error compared to recent state-of-the-art and classical semi-supervised learning methods at extremely low, medium, and high label rates.

Semi-Supervised Laplace Learning on Stiefel Manifolds

TL;DR

This work addresses semi-supervised graph learning when labels are extremely scarce, a regime where classical Laplacian-based methods degenerate. It reframes SSL as a nonconvex QCQP over the compact Stiefel Manifold and develops a scalable pipeline combining an Orthogonal Procrustes initialization with a Sequential Subspace Method (SSM) and cut-based refinement to robustly propagate labels. An active-learning score derived from the grounded Laplacian and absorbing random-walk insights guides diverse sample selection, improving performance at low label rates. Empirical results on MNIST, Fashion-MNIST, and CIFAR-10 show state-of-the-art accuracy across low to high label regimes, highlighting both the predictive power and practical efficiency of the proposed SSM framework and its active-learning extension.

Abstract

Motivated by the need to address the degeneracy of canonical Laplace learning algorithms in low label rates, we propose to reformulate graph-based semi-supervised learning as a nonconvex generalization of a \emph{Trust-Region Subproblem} (TRS). This reformulation is motivated by the well-posedness of Laplacian eigenvectors in the limit of infinite unlabeled data. To solve this problem, we first show that a first-order condition implies the solution of a manifold alignment problem and that solutions to the classical \emph{Orthogonal Procrustes} problem can be used to efficiently find good classifiers that are amenable to further refinement. To tackle refinement, we develop the framework of Sequential Subspace Optimization for graph-based SSL. Next, we address the criticality of selecting supervised samples at low-label rates. We characterize informative samples with a novel measure of centrality derived from the principal eigenvectors of a certain submatrix of the graph Laplacian. We demonstrate that our framework achieves lower classification error compared to recent state-of-the-art and classical semi-supervised learning methods at extremely low, medium, and high label rates.
Paper Structure (29 sections, 15 theorems, 107 equations, 6 figures, 6 tables, 3 algorithms)

This paper contains 29 sections, 15 theorems, 107 equations, 6 figures, 6 tables, 3 algorithms.

Key Result

proposition 1

Let $p$ be a positive scalar. Consider the minimization Let $r = -X_l^\top \mathbf{1}$ and $P = I - \frac{1}{n}\mathbf{1}\mathbf{1}^\top$ and Then, $X_\mathcal{U} = XC^{1/2} + \frac{1}{n} \mathbf{1} r^\top$, where $X$ is the minimizer of

Figures (6)

  • Figure 1: Eigenvector method and projection example on the barbell graph. (\ref{['fig:grid1']}): Embedding into $\mathbb{R}^k$ via Laplacian Eigenmaps. (\ref{['fig:grid2']}): Several iterations of gradient-based repulsion are applied to remove vertex overlaps for better visualization. (\ref{['fig:grid3']}): Consider taking an arbitrary vertex from each clique and assigning it a label (green vertices). Spectral embeddings are likely inconsistent with labeled vertices. (\ref{['fig:grid4']}): Procrustes embedding. The orthogonal transform $Q$ is derived from Prop. \ref{['rem:negdef']} and applied to $X$; $XQ$ resolves the discrepancy between the embeddings and the labeled vertices.
  • Figure 2: Barcode plots of MNIST predictors (left) and embeddings of samples for digits '2' and '7' (right). Learning is performed with 1 label per class. In the barcode plots, the rows are the samples, ordered by their class. Ordering of the columns was obtained by iteratively sorting the columns of the embedding matrices $X$. (\ref{['fig:a']},\ref{['fig:b']}) Laplace learning exhibits degeneracy in the limit of unlabeled data. (\ref{['fig:c']},\ref{['fig:d']}) Embeddings derived using Procrustes Analysis (Section \ref{['sec:approx']}) exhibit no degeneracy but mixes samples from different classes together. (\ref{['fig:e']},\ref{['fig:f']}) SSM exhibits good classification performance (a block diagonally dominant barcode and well-separated embeddings) while respecting the geometry of unlabeled examples.
  • Figure 3: Visualization of the lower-bound estimate on a ring of gaussians Labeled points are annotated as red circles. Points to be labeled are marked as red stars. Brighter regions of the heatmap indicate vertices with higher score.
  • Figure 4: Scaling behavior as the number of labeled vertices increases beyond the low label rate regime the x-axis corresponds to the label rate ($\times 10^{3}$). the y-axis is accuracy. (a): MNIST (b): F-MNIST Average accuracy scores over 10 trials. We use the publicly available implementation of Poisson Learning calder2017consistency.
  • Figure 5: Robust performance of SSM on F-MNIST. (\ref{['fig:knn']}) robustness to different numbers of neighbors $k$ used to construct the graph, averaged over $10$ trials, 5 labels per-class. (\ref{['fig:foc']}) The log-first order condition, i.e. empirical rate of convergence of Projected gradient method and SSM on F-MNIST with 5 labels per-class.
  • ...and 1 more figures

Theorems & Definitions (24)

  • proposition 1
  • proposition 2: Definiteness conditions of $X^\top B$
  • proposition 3: Hager2005GlobalSSM
  • proposition 4: Hager2001MinimizingAQ
  • remark 1: Hager2005GlobalSSM
  • proposition 5: Projection onto $\mathcal{M}$
  • proposition 6: SQP iterate of the Lagrangian of eq. (\ref{['eq:rescaled_f_lagrangian']})
  • remark 2
  • remark 3
  • proposition 7: Eigenvalues of $\Lambda_*$
  • ...and 14 more