Table of Contents
Fetching ...

Implicit Bias of Mirror Flow on Separable Data

Scott Pesme, Radu-Alexandru Dragomir, Nicolas Flammarion

TL;DR

The continuous-time counterpart of mirror descent, namely mirror flow, on classification problems which are linearly separable, and it is shown that the iterates converge in direction towards a $\phi_\infty$-maximum margin classifier.

Abstract

We examine the continuous-time counterpart of mirror descent, namely mirror flow, on classification problems which are linearly separable. Such problems are minimised `at infinity' and have many possible solutions; we study which solution is preferred by the algorithm depending on the mirror potential. For exponential tailed losses and under mild assumptions on the potential, we show that the iterates converge in direction towards a $φ_\infty$-maximum margin classifier. The function $φ_\infty$ is the \textit{horizon function} of the mirror potential and characterises its shape `at infinity'. When the potential is separable, a simple formula allows to compute this function. We analyse several examples of potentials and provide numerical experiments highlighting our results.

Implicit Bias of Mirror Flow on Separable Data

TL;DR

The continuous-time counterpart of mirror descent, namely mirror flow, on classification problems which are linearly separable, and it is shown that the iterates converge in direction towards a -maximum margin classifier.

Abstract

We examine the continuous-time counterpart of mirror descent, namely mirror flow, on classification problems which are linearly separable. Such problems are minimised `at infinity' and have many possible solutions; we study which solution is preferred by the algorithm depending on the mirror potential. For exponential tailed losses and under mild assumptions on the potential, we show that the iterates converge in direction towards a -maximum margin classifier. The function is the \textit{horizon function} of the mirror potential and characterises its shape `at infinity'. When the potential is separable, a simple formula allows to compute this function. We analyse several examples of potentials and provide numerical experiments highlighting our results.
Paper Structure (31 sections, 15 theorems, 60 equations, 4 figures)

This paper contains 31 sections, 15 theorems, 60 equations, 4 figures.

Key Result

Theorem 1

There exists a horizon function$\phi_\infty$ such that for any separable dataset, the normalised mirror flow iterates ${\bar{\beta}}_t \coloneqq \beta_t / \Vert \beta_t \Vert$ converge and satisfy:

Figures (4)

  • Figure 1: Mirror descent is performed using $3$ different potentials on the same toy $2$d dataset. Left: the losses converge to zero. Center: the iterates converge in direction towards $3$ different vectors ${\bar{\beta}}_\infty$, the $3$ lines passing through the origin correspond to the associated separating hyperplanes. Right: the limit directions are each proportional to $\mathrm{arg \ min \ } \phi_\infty({\bar{\beta}})$ under the constraint $\min_i y_i \langle x_i, {\bar{\beta}} \rangle \geq 1$ for their respective $\phi_\infty$'s, as predicted by our theory (\ref{['informal_theorem']}). The full trajectories are plotted \ref{['fig:traj']} and we refer to \ref{['section:experiments']} for more details.
  • Figure 2: Left two: Sketch of the level lines of two different potentials $\phi^{(1)}, \phi^{(2)} : \mathbb{R}^2 \to \mathbb{R}$. Right two: Their corresponding horizon functions $\phi_\infty^{(1)}$, $\phi_\infty^{(2)}$ as defined in \ref{['ss:horizon_function']}.
  • Figure 3: Illustration of the construction of the horizon shape $S_\infty$. Left: the sub-level sets $S_c$ change of shape and are increasing. Middle: in order to avoid the shapes blowing up, we normalise them to keep them in the unit ball (here we choose the arbitrary constraining norm to be the $\ell_1$-norm). Right: the normalised sub-level sets $\bar{S}_c$ converge to a limiting set $S_\infty$ for the Hausdorff distance.
  • Figure 4: Mirror flow trajectories on a 2-dimensional dataset for three different potentials (exact same setting as in \ref{['fig:2d_experiments']}). Left: the iterates diverge to infinity and the directional convergence depends on the choice of potential. Right: the normalised iterates converge towards their respective $\phi_\infty$-maximum-margin predictors (illustrated by stars), as predicted by \ref{['thm:main']}.

Theorems & Definitions (24)

  • Theorem 1: Main result, Informal
  • Proposition 1
  • Lemma 1
  • Corollary 1
  • Definition 1
  • Theorem 2
  • Proposition 2
  • Theorem 3
  • Lemma 2
  • proof
  • ...and 14 more