Table of Contents
Fetching ...

How to induce regularization in linear models: A guide to reparametrizing gradient flow

Hung-Hsu Chou, Johannes Maly, Dominik Stöger

TL;DR

This work aims at understanding the influence of the model parameters - reparametrization, loss, and link function - on the convergence behavior of gradient flow and provides conditions under which the implicit bias can be well-described and convergence of the flow is guaranteed.

Abstract

In this work, we analyze the relation between reparametrizations of gradient flow and the induced implicit bias in linear models, which encompass various basic regression tasks. In particular, we aim at understanding the influence of the model parameters - reparametrization, loss, and link function - on the convergence behavior of gradient flow. Our results provide conditions under which the implicit bias can be well-described and convergence of the flow is guaranteed. We furthermore show how to use these insights for designing reparametrization functions that lead to specific implicit biases which are closely connected to $\ell_p$- or trigonometric regularizers.

How to induce regularization in linear models: A guide to reparametrizing gradient flow

TL;DR

This work aims at understanding the influence of the model parameters - reparametrization, loss, and link function - on the convergence behavior of gradient flow and provides conditions under which the implicit bias can be well-described and convergence of the flow is guaranteed.

Abstract

In this work, we analyze the relation between reparametrizations of gradient flow and the induced implicit bias in linear models, which encompass various basic regression tasks. In particular, we aim at understanding the influence of the model parameters - reparametrization, loss, and link function - on the convergence behavior of gradient flow. Our results provide conditions under which the implicit bias can be well-described and convergence of the flow is guaranteed. We furthermore show how to use these insights for designing reparametrization functions that lead to specific implicit biases which are closely connected to - or trigonometric regularizers.
Paper Structure (11 sections, 11 theorems, 84 equations, 2 figures)

This paper contains 11 sections, 11 theorems, 84 equations, 2 figures.

Key Result

Theorem 1.4

Under Assumption assumption:Intro, let $\rho_{\text{rp}}(z) = \mathrm{sign}(z) |z|^{\frac{2}{p}}$, for $p \in (1,2)$, and let ${\bf w} \colon [0,T) \to \mathbb{R}^N$ be any regular solution to eq:gd_IRERM with ${\bf w}_0 = \rho_{\text{rp}}^{-1} (\alpha {\bf 1})$, for $\alpha > 0$, ${\bf A}\in\mathbb and assume that Then it holds that In other words, for small $\alpha$, the reparametrized flow $\

Figures (2)

  • Figure 1: Implicit regularization of gradient descent for $\rho_{\text{rp}}(z) = \mathrm{sign}(z)|z|^{\frac{2}{p}}$ with learning rate $10^{-4}$ initialized at $(10^{-4},10^{-4})^T$. In Figure \ref{['fig:p=1.2']}, we set ${\bf A} = (-0.7,1) \in \mathbb{R}^{1\times 2}$, ${\bf y} = 2$, and $L(z,y) = |z-y|^{1.1}$. In Figure \ref{['fig:p=1.8']}, we set ${\bf A} = (-0.7,-1) \in \mathbb{R}^{1\times 2}$, ${\bf y} = 2$, and $L(z,y) = |z-y|^2$. The depicted $\ell_p$-ball is scaled to the final GD iterate.
  • Figure 2: Comparison between $g_\text{sinh}$ and the Huber function Huber1964 with parameter $\frac{\pi}{2}$.

Theorems & Definitions (28)

  • Definition 1.1
  • Remark 1.2
  • Theorem 1.4
  • Theorem 1.5
  • Remark 1.6
  • Remark 2.2
  • Definition 2.3: Bregman Divergence
  • Theorem 2.4: Implicit Bias
  • Definition 2.5
  • Remark 2.6
  • ...and 18 more