Faster Margin Maximization Rates for Generic and Adversarially Robust Optimization Methods

Guanghui Wang; Zihao Hu; Claudio Gentile; Vidya Muthukumar; Jacob Abernethy

Faster Margin Maximization Rates for Generic and Adversarially Robust Optimization Methods

Guanghui Wang, Zihao Hu, Claudio Gentile, Vidya Muthukumar, Jacob Abernethy

TL;DR

Faster Margin Maximization Rates for Generic and Adversarially Robust Optimization Methods introduces a unified online-learning framework that recasts ERM with exponential loss as a regularized bilinear game, enabling margin guarantees via averaged regret. It derives the fastest known margin-convergence rates for generic methods: for mirror descent with square $q$-norm potential, the margin rate is $\mathcal{O}\left(\frac{\log n\log T}{(q-1)T}\right)$ and can be improved to $\mathcal{O}\left(\frac{1}{T(q-1)}+\frac{\log n\log T}{T^2}\right)$; for steepest descent with a strongly convex norm, the rate is $\mathcal{O}\left(\frac{\log n}{T}\right)$, and accelerated variants yield $\mathcal{O}\left(\frac{\log n}{T^2(q-1)}\right)$. In adversarial training, normalized gradient descent achieves $\mathcal{O}\left(\frac{\log n}{T}\right)$ for $s\in(1,2]$ and $\mathcal{O}\left(\frac{\log n}{T^2}\right)$ with acceleration (and $\mathcal{O}\left(\frac{\log n}{T}\right)$ for $s>2$); plus a multilinear extension with perturbation-players. The framework hinges on turning ERM into a two-player online-dynamic with regret bounds, yielding simple, data-dependent margin guarantees and insight into implicit bias across both clean and adversarial settings.

Abstract

First-order optimization methods tend to inherently favor certain solutions over others when minimizing an underdetermined training objective that has multiple global optima. This phenomenon, known as implicit bias, plays a critical role in understanding the generalization capabilities of optimization algorithms. Recent research has revealed that in separable binary classification tasks gradient-descent-based methods exhibit an implicit bias for the $\ell_2$-maximal margin classifier. Similarly, generic optimization methods, such as mirror descent and steepest descent, have been shown to converge to maximal margin classifiers defined by alternative geometries. While gradient-descent-based algorithms provably achieve fast implicit bias rates, corresponding rates in the literature for generic optimization methods are relatively slow. To address this limitation, we present a series of state-of-the-art implicit bias rates for mirror descent and steepest descent algorithms. Our primary technique involves transforming a generic optimization algorithm into an online optimization dynamic that solves a regularized bilinear game, providing a unified framework for analyzing the implicit bias of various optimization methods. Our accelerated rates are derived by leveraging the regret bounds of online learning algorithms within this game framework. We then show the flexibility of this framework by analyzing the implicit bias in adversarial training, and again obtain significantly improved convergence rates.

Faster Margin Maximization Rates for Generic and Adversarially Robust Optimization Methods

TL;DR

-norm potential, the margin rate is

and can be improved to

; for steepest descent with a strongly convex norm, the rate is

, and accelerated variants yield

. In adversarial training, normalized gradient descent achieves

for

and

with acceleration (and

for

); plus a multilinear extension with perturbation-players. The framework hinges on turning ERM into a two-player online-dynamic with regret bounds, yielding simple, data-dependent margin guarantees and insight into implicit bias across both clean and adversarial settings.

Abstract

-maximal margin classifier. Similarly, generic optimization methods, such as mirror descent and steepest descent, have been shown to converge to maximal margin classifiers defined by alternative geometries. While gradient-descent-based algorithms provably achieve fast implicit bias rates, corresponding rates in the literature for generic optimization methods are relatively slow. To address this limitation, we present a series of state-of-the-art implicit bias rates for mirror descent and steepest descent algorithms. Our primary technique involves transforming a generic optimization algorithm into an online optimization dynamic that solves a regularized bilinear game, providing a unified framework for analyzing the implicit bias of various optimization methods. Our accelerated rates are derived by leveraging the regret bounds of online learning algorithms within this game framework. We then show the flexibility of this framework by analyzing the implicit bias in adversarial training, and again obtain significantly improved convergence rates.

Paper Structure (24 sections, 15 theorems, 97 equations, 12 figures, 1 table, 10 algorithms)

This paper contains 24 sections, 15 theorems, 97 equations, 12 figures, 1 table, 10 algorithms.

Introduction
Main results and techniques
Additional Related Work
Preliminaries
A Game Framework for Maximizing the Margin
Proof of Theorem \ref{['thm:margin']}
Implicit Bias of Generic Methods
Mirror-Descent-Type of Methods
Steepest Descent
Even Faster Rates with Accelerated Generic Methods
Implicit Bias in Adversarial Training
Basic Setting
Understanding $\ell_s$-AT Via The Game Framework
$\ell_s$-AT with Gradient Descent
$\ell_s$-AT With Nesterov-style Acceleration
...and 9 more sections

Key Result

Theorem 1

Suppose Assumption ass:only holds with respect to some general norm $\|\cdot\|$. Consider solving the two-player zero-sum game defined in eqn:defn:game by applying Protocol pro:no-regret for game. Then $\widetilde{\mathbf{w}}_T$ will have a positive margin on round $T$ if $C_T\leq \frac{\gamma^2}{4} If $\Phi(\mathbf{w})$ is $\lambda$-strongly convex with respect to the norm $\|\cdot\|$, we have

Figures (12)

Figure 1: Illustration of the game framework for implicit bias analysis. In Section \ref{['section:gameframework']}, we show that solving a regularized bilinear game with online learning algorithms (top box) can directly maximize the margin, and the convergence rate is on the same order of the averaged regret $C_T$ (right box); In Sections \ref{['sec:main result']}, we prove that minimizing the empirical risk with a series of generic optimization methods (left box) is equivalent to using online learning algorithms to solve the regularized bilinear game. Thus, the implicit bias rates can be directly obtained by plugging in the regret bounds.
Figure : Fast margin maximization rates for generic optimization methods and adversarial training.
Figure : No-regret dynamics with weighted OCO for solving $g(\mathbf{p},\mathbf{w})$
Figure : Mirror Descent [Recall $\ell_t(\mathbf{p})=g(\mathbf{p},\mathbf{w}_t),$ and $h_t(\mathbf{w})=-g(\mathbf{p}_t,\mathbf{w})$]
Figure : Momentum-based MD
...and 7 more figures

Theorems & Definitions (19)

Definition 1: $\|\cdot\|$-margin
Definition 2: $\|\cdot\|$-Margin maximization rate and $\|\cdot\|$-directional error
Theorem 1
Lemma 1: abernethy2018faster
Theorem 2
Theorem 3
Theorem 4
Theorem 5
Theorem 6
Lemma 2: Condition for $\gamma_{2,s}>0$
...and 9 more

Faster Margin Maximization Rates for Generic and Adversarially Robust Optimization Methods

TL;DR

Abstract

Faster Margin Maximization Rates for Generic and Adversarially Robust Optimization Methods

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (19)