Table of Contents
Fetching ...

Universality of max-margin classifiers

Andrea Montanari, Feng Ruan, Basil Saeed, Youngtak Sohn

TL;DR

This work establishes a universality principle for max-margin classification in high dimensions with nonlinear randomized or independent feature maps. By linking non-Gaussian feature models to Gaussian equivalents through second-moments and covariance structures, the authors reduce analysis to Gaussian settings and obtain stable margins and test errors as dimensionality grows. A duality-based strategy reveals that the max-margin can be treated as an average over a linear number of support vectors, enabling transfer of Gaussian results via delocalization and ERM-like reformulations. The results imply that overparameterization thresholds and generalization behavior, including benign overfitting, are governed by the same spectral properties across a broad class of feature maps. This provides a rigorous bridge between non-Gaussian randomized features and classical Gaussian analyses, with concrete implications for understanding high-dimensional classification in practice.

Abstract

Maximum margin binary classification is one of the most fundamental algorithms in machine learning, yet the role of featurization maps and the high-dimensional asymptotics of the misclassification error for non-Gaussian features are still poorly understood. We consider settings in which we observe binary labels $y_i$ and either $d$-dimensional covariates ${\boldsymbol z}_i$ that are mapped to a $p$-dimension space via a randomized featurization map ${\boldsymbol φ}:\mathbb{R}^d \to\mathbb{R}^p$, or $p$-dimensional features of non-Gaussian independent entries. In this context, we study two fundamental questions: $(i)$ At what overparametrization ratio $p/n$ do the data become linearly separable? $(ii)$ What is the generalization error of the max-margin classifier? Working in the high-dimensional regime in which the number of features $p$, the number of samples $n$ and the input dimension $d$ (in the nonlinear featurization setting) diverge, with ratios of order one, we prove a universality result establishing that the asymptotic behavior is completely determined by the expected covariance of feature vectors and by the covariance between features and labels. In particular, the overparametrization threshold and generalization error can be computed within a simpler Gaussian model. The main technical challenge lies in the fact that max-margin is not the maximizer (or minimizer) of an empirical average, but the maximizer of a minimum over the samples. We address this by representing the classifier as an average over support vectors. Crucially, we find that in high dimensions, the support vector count is proportional to the number of samples, which ultimately yields universality.

Universality of max-margin classifiers

TL;DR

This work establishes a universality principle for max-margin classification in high dimensions with nonlinear randomized or independent feature maps. By linking non-Gaussian feature models to Gaussian equivalents through second-moments and covariance structures, the authors reduce analysis to Gaussian settings and obtain stable margins and test errors as dimensionality grows. A duality-based strategy reveals that the max-margin can be treated as an average over a linear number of support vectors, enabling transfer of Gaussian results via delocalization and ERM-like reformulations. The results imply that overparameterization thresholds and generalization behavior, including benign overfitting, are governed by the same spectral properties across a broad class of feature maps. This provides a rigorous bridge between non-Gaussian randomized features and classical Gaussian analyses, with concrete implications for understanding high-dimensional classification in practice.

Abstract

Maximum margin binary classification is one of the most fundamental algorithms in machine learning, yet the role of featurization maps and the high-dimensional asymptotics of the misclassification error for non-Gaussian features are still poorly understood. We consider settings in which we observe binary labels and either -dimensional covariates that are mapped to a -dimension space via a randomized featurization map , or -dimensional features of non-Gaussian independent entries. In this context, we study two fundamental questions: At what overparametrization ratio do the data become linearly separable? What is the generalization error of the max-margin classifier? Working in the high-dimensional regime in which the number of features , the number of samples and the input dimension (in the nonlinear featurization setting) diverge, with ratios of order one, we prove a universality result establishing that the asymptotic behavior is completely determined by the expected covariance of feature vectors and by the covariance between features and labels. In particular, the overparametrization threshold and generalization error can be computed within a simpler Gaussian model. The main technical challenge lies in the fact that max-margin is not the maximizer (or minimizer) of an empirical average, but the maximizer of a minimum over the samples. We address this by representing the classifier as an average over support vectors. Crucially, we find that in high dimensions, the support vector count is proportional to the number of samples, which ultimately yields universality.
Paper Structure (66 sections, 48 theorems, 289 equations)

This paper contains 66 sections, 48 theorems, 289 equations.

Key Result

Theorem 1

Consider either $({\boldsymbol X},{\boldsymbol G})=({\boldsymbol X}_{{\sf RF}},{\boldsymbol G}_{{\sf RF}})$ under Assumption assumption:RF or $({\boldsymbol X},{\boldsymbol G})=({\boldsymbol X}_{\sf ind},{\boldsymbol G}_{\sf ind})$ under Assumption assumption:ind. Then, for any Lipschitz function $\

Theorems & Definitions (82)

  • Remark 3.1
  • Remark 3.2
  • Remark 3.3
  • Theorem 1: Universality of the margin
  • Theorem 2: Universality of the test error
  • Corollary 1
  • Lemma 1: Restricted Strong Convexity
  • Proposition 1
  • Proposition 2
  • Remark 4.1
  • ...and 72 more