How to induce regularization in linear models: A guide to reparametrizing gradient flow

Hung-Hsu Chou; Johannes Maly; Dominik Stöger

How to induce regularization in linear models: A guide to reparametrizing gradient flow

Hung-Hsu Chou, Johannes Maly, Dominik Stöger

TL;DR

This work aims at understanding the influence of the model parameters - reparametrization, loss, and link function - on the convergence behavior of gradient flow and provides conditions under which the implicit bias can be well-described and convergence of the flow is guaranteed.

Abstract

In this work, we analyze the relation between reparametrizations of gradient flow and the induced implicit bias in linear models, which encompass various basic regression tasks. In particular, we aim at understanding the influence of the model parameters - reparametrization, loss, and link function - on the convergence behavior of gradient flow. Our results provide conditions under which the implicit bias can be well-described and convergence of the flow is guaranteed. We furthermore show how to use these insights for designing reparametrization functions that lead to specific implicit biases which are closely connected to $\ell_p$- or trigonometric regularizers.

How to induce regularization in linear models: A guide to reparametrizing gradient flow

TL;DR

Abstract

- or trigonometric regularizers.

Paper Structure (11 sections, 11 theorems, 84 equations, 2 figures)

This paper contains 11 sections, 11 theorems, 84 equations, 2 figures.

Introduction
Contribution
Related work
Outline and Notation
Reparametrizing gradient flow on GLMs
General framework and main result
How to induce specific regularization
Proofs
Proof of Theorem \ref{['theorem:vector_IRERM']}
Proof of Theorem \ref{['thm:Convergence']}
Proofs of Corollaries \ref{['cor:PolynomialRegularization']} and \ref{['cor:TrigonometricRegularization']}

Key Result

Theorem 1.4

Under Assumption assumption:Intro, let $\rho_{\text{rp}}(z) = \mathrm{sign}(z) |z|^{\frac{2}{p}}$, for $p \in (1,2)$, and let ${\bf w} \colon [0,T) \to \mathbb{R}^N$ be any regular solution to eq:gd_IRERM with ${\bf w}_0 = \rho_{\text{rp}}^{-1} (\alpha {\bf 1})$, for $\alpha > 0$, ${\bf A}\in\mathbb and assume that Then it holds that In other words, for small $\alpha$, the reparametrized flow $\

Figures (2)

Figure 1: Implicit regularization of gradient descent for $\rho_{\text{rp}}(z) = \mathrm{sign}(z)|z|^{\frac{2}{p}}$ with learning rate $10^{-4}$ initialized at $(10^{-4},10^{-4})^T$. In Figure \ref{['fig:p=1.2']}, we set ${\bf A} = (-0.7,1) \in \mathbb{R}^{1\times 2}$, ${\bf y} = 2$, and $L(z,y) = |z-y|^{1.1}$. In Figure \ref{['fig:p=1.8']}, we set ${\bf A} = (-0.7,-1) \in \mathbb{R}^{1\times 2}$, ${\bf y} = 2$, and $L(z,y) = |z-y|^2$. The depicted $\ell_p$-ball is scaled to the final GD iterate.
Figure 2: Comparison between $g_\text{sinh}$ and the Huber function Huber1964 with parameter $\frac{\pi}{2}$.

Theorems & Definitions (28)

Definition 1.1
Remark 1.2
Theorem 1.4
Theorem 1.5
Remark 1.6
Remark 2.2
Definition 2.3: Bregman Divergence
Theorem 2.4: Implicit Bias
Definition 2.5
Remark 2.6
...and 18 more

How to induce regularization in linear models: A guide to reparametrizing gradient flow

TL;DR

Abstract

How to induce regularization in linear models: A guide to reparametrizing gradient flow

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (28)