Data-Dependent Complexity of First-Order Methods for Binary Classification

Matthew Hough; Stephen A. Vavasis

Data-Dependent Complexity of First-Order Methods for Binary Classification

Matthew Hough, Stephen A. Vavasis

TL;DR

The paper investigates data-dependent iteration complexity for binary classification tasks solved with first-order methods. It develops a FISTA-based approach for two problems: the Ellipsoid Separation Problem (ESP) and soft-margin SVM, deriving data-driven stopping criteria that rely on geometric data properties rather than worst-case algorithmic constants. For ESP, a dual SOCP formulation lets FISTA yield a separating hyperplane via its residual, with an explicit upper bound on iterations scaling as $ abla abla$-style terms and a perturbation-based interpretation of separability. For SVM, a strongly concave perturbed dual ensures unique minimizers and enables efficient identification of well-classified points and a hyperplane separating them, with empirical results showing competitive runtimes against LIBSVM and LIBLINEAR and speedups from early stopping. Overall, the work demonstrates practical, data-dependent stopping rules that accelerate large-scale binary classification while providing theoretical guarantees tied to data geometry.

Abstract

Large-scale problems in data science are often modeled with optimization, and the optimization model is usually solved with first-order methods that may converge at a sublinear rate. Therefore, it is of interest to terminate the optimization algorithm as soon as the underlying data science task is accomplished. We consider FISTA for solving two binary classification problems: the ellipsoid separation problem (ESP), and the soft-margin support-vector machine (SVM). For the ESP, we cast the dual second-order cone program into a form amenable to FISTA and show that the FISTA residual converges to the infimal displacement vector of the primal-dual hybrid gradient (PDHG) algorithm, that directly encodes a separating hyperplane. We further derive a data-dependent iteration upper bound scaling as $\mathcal{O}(1/δ_{\mathcal{A}}^2)$, where $δ_{\mathcal{A}}$ is the minimal perturbation that destroys separability. For the SVM, we propose a strongly-concave perturbed dual that admits efficient FISTA updates under a linear time projection scheme, and with our parameter choices, the objective has small condition number, enabling rapid convergence. We prove that, under a reasonable data model, early-stopped iterates identify well-classified points and yield a hyperplane that exactly separates them, where the accuracy required of the dual iterate is governed by geometric properties of the data. In particular, the proposed early-stopping criteria diminish the need for hard-to-select tolerance-based stopping conditions. Our numerical experiments on ESP instances derived from MNIST data and on soft-margin SVM benchmarks indicate competitive runtimes and substantial speedups from stopping early.

Data-Dependent Complexity of First-Order Methods for Binary Classification

TL;DR

-style terms and a perturbation-based interpretation of separability. For SVM, a strongly concave perturbed dual ensures unique minimizers and enables efficient identification of well-classified points and a hyperplane separating them, with empirical results showing competitive runtimes against LIBSVM and LIBLINEAR and speedups from early stopping. Overall, the work demonstrates practical, data-dependent stopping rules that accelerate large-scale binary classification while providing theoretical guarantees tied to data geometry.

Abstract

, where

is the minimal perturbation that destroys separability. For the SVM, we propose a strongly-concave perturbed dual that admits efficient FISTA updates under a linear time projection scheme, and with our parameter choices, the objective has small condition number, enabling rapid convergence. We prove that, under a reasonable data model, early-stopped iterates identify well-classified points and yield a hyperplane that exactly separates them, where the accuracy required of the dual iterate is governed by geometric properties of the data. In particular, the proposed early-stopping criteria diminish the need for hard-to-select tolerance-based stopping conditions. Our numerical experiments on ESP instances derived from MNIST data and on soft-margin SVM benchmarks indicate competitive runtimes and substantial speedups from stopping early.

Data-Dependent Complexity of First-Order Methods for Binary Classification

TL;DR

Abstract

Data-Dependent Complexity of First-Order Methods for Binary Classification

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (56)