Table of Contents
Fetching ...

A Mirror Descent Perspective of Smoothed Sign Descent

Shuyang Wang, Diego Klabjan

TL;DR

This work uses the mirror descent framework to study the dynamics of smoothed sign descent with a stability constant $\varepsilon$ for regression problems, and proposes a mirror map that establishes equivalence to dual dynamics under some assumptions.

Abstract

Recent work by Woodworth et al. (2020) shows that the optimization dynamics of gradient descent for overparameterized problems can be viewed as low-dimensional dual dynamics induced by a mirror map, explaining the implicit regularization phenomenon from the mirror descent perspective. However, the methodology does not apply to algorithms where update directions deviate from true gradients, such as ADAM. We use the mirror descent framework to study the dynamics of smoothed sign descent with a stability constant $\varepsilon$ for regression problems. We propose a mirror map that establishes equivalence to dual dynamics under some assumptions. By studying dual dynamics, we characterize the convergent solution as an approximate KKT point of minimizing a Bregman divergence style function, and show the benefit of tuning the stability constant $\varepsilon$ to reduce the KKT error.

A Mirror Descent Perspective of Smoothed Sign Descent

TL;DR

This work uses the mirror descent framework to study the dynamics of smoothed sign descent with a stability constant for regression problems, and proposes a mirror map that establishes equivalence to dual dynamics under some assumptions.

Abstract

Recent work by Woodworth et al. (2020) shows that the optimization dynamics of gradient descent for overparameterized problems can be viewed as low-dimensional dual dynamics induced by a mirror map, explaining the implicit regularization phenomenon from the mirror descent perspective. However, the methodology does not apply to algorithms where update directions deviate from true gradients, such as ADAM. We use the mirror descent framework to study the dynamics of smoothed sign descent with a stability constant for regression problems. We propose a mirror map that establishes equivalence to dual dynamics under some assumptions. By studying dual dynamics, we characterize the convergent solution as an approximate KKT point of minimizing a Bregman divergence style function, and show the benefit of tuning the stability constant to reduce the KKT error.

Paper Structure

This paper contains 22 sections, 15 theorems, 124 equations, 3 figures.

Key Result

Proposition 3.4

For each coordinate $i \in \{1, \dots, D\}$,

Figures (3)

  • Figure 1: Evolution of primal variable $\bm\beta(t)$ and dual variable $\nabla \Phi_t (\bm\beta(t))$ in $\mathbb{R}^5$ of smoothed sign descent with different values of stability constant $\varepsilon$. The vertical line $t=T_0$ marks the transition from warm-up stage to the sign descent stage, and the line $t=T$ marks the transition to the convergence stage.
  • Figure 2: Trajectories of $\bm\beta(t)$ in $\mathbb{R}^2$ for different values of stability constant $\varepsilon$.
  • Figure 3: Bregman divergence style function value $E(\bm\beta^\infty, \bm\beta_0)$ of convergent solutions with different values of stability constant $\varepsilon$.

Theorems & Definitions (31)

  • Definition 3.1: Bregman divergence
  • Proposition 3.4
  • Proposition 3.5
  • Proposition 3.6: Dual dynamics of smoothed sign descent
  • Proposition 3.7
  • Theorem 3.9
  • Corollary 3.10
  • Corollary 3.11
  • Corollary 3.12
  • proof
  • ...and 21 more