Replicable Learning of Large-Margin Halfspaces

Alkis Kalavasis; Amin Karbasi; Kasper Green Larsen; Grigoris Velegkas; Felix Zhou

Replicable Learning of Large-Margin Halfspaces

Alkis Kalavasis, Amin Karbasi, Kasper Green Larsen, Grigoris Velegkas, Felix Zhou

TL;DR

The paper tackles the problem of replicably learning large-margin halfspaces in $\mathbb{R}^d$ under margin $\tau$, aiming for dimension-independent guarantees and practical runtimes. It develops multiple replicable algorithms that combine Johnson-Lindenstrauss dimensionality reduction, the Alon-Klartag rounding scheme, and batch-based aggregation, including a principled SGD-based approach with boosting. The main contributions include a dimension-independent, polynomial-time algorithm with sample complexity $\tilde{O}(\epsilon^{-1} \tau^{-7} \rho^{-2})$ (Alg2), a second SGD-based variant with $\tilde{O}(\epsilon^{-2} \tau^{-6} \rho^{-2})$ sample complexity (Alg4), and a DP-to-replicability approach yielding an inefficient but $\tau$-biased method with $\tilde{O}(\epsilon^{-2} \tau^{-4} \rho^{-2})$ samples; plus an even more efficient but computationally heavier net-based method (Alg3-inefficient) achieving $\tilde{O}(\epsilon^{-1} \tau^{-4} \rho^{-2})$ samples. The work links replicability to stability and differential privacy, providing concrete algorithms with provable replicability guarantees and explicit trade-offs between accuracy, margin, replicability, and running time. Overall, it significantly advances replicable learning for margin-based classifiers by delivering dimension-free, efficient procedures and clarifying the computational-precision landscape.

Abstract

We provide efficient replicable algorithms for the problem of learning large-margin halfspaces. Our results improve upon the algorithms provided by Impagliazzo, Lei, Pitassi, and Sorrell [STOC, 2022]. We design the first dimension-independent replicable algorithms for this task which runs in polynomial time, is proper, and has strictly improved sample complexity compared to the one achieved by Impagliazzo et al. [2022] with respect to all the relevant parameters. Moreover, our first algorithm has sample complexity that is optimal with respect to the accuracy parameter $ε$. We also design an SGD-based replicable algorithm that, in some parameters' regimes, achieves better sample and time complexity than our first algorithm. Departing from the requirement of polynomial time algorithms, using the DP-to-Replicability reduction of Bun, Gaboardi, Hopkins, Impagliazzo, Lei, Pitassi, Sorrell, and Sivakumar [STOC, 2023], we show how to obtain a replicable algorithm for large-margin halfspaces with improved sample complexity with respect to the margin parameter $τ$, but running time doubly exponential in $1/τ^2$ and worse sample complexity dependence on $ε$ than one of our previous algorithms. We then design an improved algorithm with better sample complexity than all three of our previous algorithms and running time exponential in $1/τ^{2}$.

Replicable Learning of Large-Margin Halfspaces

TL;DR

The paper tackles the problem of replicably learning large-margin halfspaces in

under margin

, aiming for dimension-independent guarantees and practical runtimes. It develops multiple replicable algorithms that combine Johnson-Lindenstrauss dimensionality reduction, the Alon-Klartag rounding scheme, and batch-based aggregation, including a principled SGD-based approach with boosting. The main contributions include a dimension-independent, polynomial-time algorithm with sample complexity

(Alg2), a second SGD-based variant with

sample complexity (Alg4), and a DP-to-replicability approach yielding an inefficient but

-biased method with

samples; plus an even more efficient but computationally heavier net-based method (Alg3-inefficient) achieving

samples. The work links replicability to stability and differential privacy, providing concrete algorithms with provable replicability guarantees and explicit trade-offs between accuracy, margin, replicability, and running time. Overall, it significantly advances replicable learning for margin-based classifiers by delivering dimension-free, efficient procedures and clarifying the computational-precision landscape.

Abstract

. We also design an SGD-based replicable algorithm that, in some parameters' regimes, achieves better sample and time complexity than our first algorithm. Departing from the requirement of polynomial time algorithms, using the DP-to-Replicability reduction of Bun, Gaboardi, Hopkins, Impagliazzo, Lei, Pitassi, Sorrell, and Sivakumar [STOC, 2023], we show how to obtain a replicable algorithm for large-margin halfspaces with improved sample complexity with respect to the margin parameter

, but running time doubly exponential in

and worse sample complexity dependence on

than one of our previous algorithms. We then design an improved algorithm with better sample complexity than all three of our previous algorithms and running time exponential in

Paper Structure (35 sections, 26 theorems, 27 equations, 1 table, 3 algorithms)

This paper contains 35 sections, 26 theorems, 27 equations, 1 table, 3 algorithms.

Introduction
Our Contribution
Computationally Inefficient Reductions from DP.
Related Work
Replicability.
Large-Margin Halfspaces.
The Main Tool: The Alon-Klartag Rounding Scheme
Replicably Learning Large-Margin Halfspaces: \ref{['alg:algo2']}
Description of \ref{['alg:algo2']}.
Correctness of \ref{['alg:algo2']}.
Replicability of \ref{['alg:algo2']}.
Sample Complexity & Running Time of \ref{['alg:algo2']}.
Replicably Learning Large-Margin Halfspaces: \ref{['alg:algo4']}
Description of \ref{['alg:algo4']}.
Replicably Learning Large-Margin Halfspaces: \ref{['alg:algo3-inefficient']}
...and 20 more sections

Key Result

Theorem 1.3

Fix $\epsilon, \tau, \rho, \delta\in (0, 1)$. Let $\mathcal{D}$ be a distribution over $\mathbb R^d \times \{-1,1\}$ that has linear margin $\tau$ as in def:margin. There is an algorithm that is $\rho$-replicable and, given $m = \widetilde{O}(\epsilon^{-1} \tau^{-7} \rho^{-2} \log(1/\delta))$ i.i.d.

Theorems & Definitions (35)

Definition 1.1: Replicability impagliazzo2022reproducibility
Definition 1.2: Large-Margin Halfspaces
Theorem 1.3: Efficient Replicable \ref{['alg:algo2']}
Theorem 1.4: Efficient Replicable \ref{['alg:algo4']}
Proposition 1.5: Inefficient Replicable Algorithm; follows from le2020efficientbun2023stability
Theorem 1.6: Improved Inefficient Replicable \ref{['alg:algo3']}
Lemma 2.0: Stability of Rounding
Lemma 2.0: Rounding preserves Inner Products
Lemma 3.0
Lemma 3.0
...and 25 more

Replicable Learning of Large-Margin Halfspaces

TL;DR

Abstract

Replicable Learning of Large-Margin Halfspaces

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (35)