Implicit Bias of Mirror Flow on Separable Data

Scott Pesme; Radu-Alexandru Dragomir; Nicolas Flammarion

Implicit Bias of Mirror Flow on Separable Data

Scott Pesme, Radu-Alexandru Dragomir, Nicolas Flammarion

TL;DR

The continuous-time counterpart of mirror descent, namely mirror flow, on classification problems which are linearly separable, and it is shown that the iterates converge in direction towards a $\phi_\infty$-maximum margin classifier.

Abstract

We examine the continuous-time counterpart of mirror descent, namely mirror flow, on classification problems which are linearly separable. Such problems are minimised `at infinity' and have many possible solutions; we study which solution is preferred by the algorithm depending on the mirror potential. For exponential tailed losses and under mild assumptions on the potential, we show that the iterates converge in direction towards a $φ_\infty$-maximum margin classifier. The function $φ_\infty$ is the \textit{horizon function} of the mirror potential and characterises its shape `at infinity'. When the potential is separable, a simple formula allows to compute this function. We analyse several examples of potentials and provide numerical experiments highlighting our results.

Implicit Bias of Mirror Flow on Separable Data

TL;DR

The continuous-time counterpart of mirror descent, namely mirror flow, on classification problems which are linearly separable, and it is shown that the iterates converge in direction towards a

-maximum margin classifier.

Abstract

-maximum margin classifier. The function

is the \textit{horizon function} of the mirror potential and characterises its shape `at infinity'. When the potential is separable, a simple formula allows to compute this function. We analyse several examples of potentials and provide numerical experiments highlighting our results.

Paper Structure (31 sections, 15 theorems, 60 equations, 4 figures)

This paper contains 31 sections, 15 theorems, 60 equations, 4 figures.

Introduction
Informal statement of the main result
Relevance of mirror descent and related work
Relevance of studying mirror descent in the context of machine learning.
Gradient descent in classification.
Notations
Problem set-up
Intuitive construction of the implicit regularisation problem
Preliminaries.
Warm-up: gradient flow
General potential: introducing the horizon function $\phi_\infty$
Main result: directional convergence towards the $\phi_\infty$-max margin
Construction of the horizon function $\phi_\infty$
Horizon shape.
Horizon function.
...and 16 more sections

Key Result

Theorem 1

There exists a horizon function$\phi_\infty$ such that for any separable dataset, the normalised mirror flow iterates ${\bar{\beta}}_t \coloneqq \beta_t / \Vert \beta_t \Vert$ converge and satisfy:

Figures (4)

Figure 1: Mirror descent is performed using $3$ different potentials on the same toy $2$d dataset. Left: the losses converge to zero. Center: the iterates converge in direction towards $3$ different vectors ${\bar{\beta}}_\infty$, the $3$ lines passing through the origin correspond to the associated separating hyperplanes. Right: the limit directions are each proportional to $\mathrm{arg \ min \ } \phi_\infty({\bar{\beta}})$ under the constraint $\min_i y_i \langle x_i, {\bar{\beta}} \rangle \geq 1$ for their respective $\phi_\infty$'s, as predicted by our theory (\ref{['informal_theorem']}). The full trajectories are plotted \ref{['fig:traj']} and we refer to \ref{['section:experiments']} for more details.
Figure 2: Left two: Sketch of the level lines of two different potentials $\phi^{(1)}, \phi^{(2)} : \mathbb{R}^2 \to \mathbb{R}$. Right two: Their corresponding horizon functions $\phi_\infty^{(1)}$, $\phi_\infty^{(2)}$ as defined in \ref{['ss:horizon_function']}.
Figure 3: Illustration of the construction of the horizon shape $S_\infty$. Left: the sub-level sets $S_c$ change of shape and are increasing. Middle: in order to avoid the shapes blowing up, we normalise them to keep them in the unit ball (here we choose the arbitrary constraining norm to be the $\ell_1$-norm). Right: the normalised sub-level sets $\bar{S}_c$ converge to a limiting set $S_\infty$ for the Hausdorff distance.
Figure 4: Mirror flow trajectories on a 2-dimensional dataset for three different potentials (exact same setting as in \ref{['fig:2d_experiments']}). Left: the iterates diverge to infinity and the directional convergence depends on the choice of potential. Right: the normalised iterates converge towards their respective $\phi_\infty$-maximum-margin predictors (illustrated by stars), as predicted by \ref{['thm:main']}.

Theorems & Definitions (24)

Theorem 1: Main result, Informal
Proposition 1
Lemma 1
Corollary 1
Definition 1
Theorem 2
Proposition 2
Theorem 3
Lemma 2
proof
...and 14 more

Implicit Bias of Mirror Flow on Separable Data

TL;DR

Abstract

Implicit Bias of Mirror Flow on Separable Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (24)