Table of Contents
Fetching ...

Regularization Implies balancedness in the deep linear network

Kathryn Lindsey, Govind Menon

TL;DR

This work uses geometric invariant theory (GIT) to study the deep linear network (DLN) and shows that the regularizing flow is exactly solvable using the moment map.

Abstract

We use geometric invariant theory (GIT) to study the deep linear network (DLN). The Kempf-Ness theorem is used to establish that the $L^2$ regularizer is minimized on the balanced manifold. This allows us to decompose the training dynamics into two distinct gradient flows: a regularizing flow on fibers and a learning flow on the balanced manifold. We show that the regularizing flow is exactly solvable using the moment map. This approach provides a common mathematical framework for balancedness in deep learning and linear systems theory. We use this framework to interpret balancedness in terms of model reduction and Bayesian principles.

Regularization Implies balancedness in the deep linear network

TL;DR

This work uses geometric invariant theory (GIT) to study the deep linear network (DLN) and shows that the regularizing flow is exactly solvable using the moment map.

Abstract

We use geometric invariant theory (GIT) to study the deep linear network (DLN). The Kempf-Ness theorem is used to establish that the regularizer is minimized on the balanced manifold. This allows us to decompose the training dynamics into two distinct gradient flows: a regularizing flow on fibers and a learning flow on the balanced manifold. We show that the regularizing flow is exactly solvable using the moment map. This approach provides a common mathematical framework for balancedness in deep learning and linear systems theory. We use this framework to interpret balancedness in terms of model reduction and Bayesian principles.

Paper Structure

This paper contains 26 sections, 12 theorems, 66 equations, 3 figures.

Key Result

Theorem 1

Assume $X$ has full rank. Then

Figures (3)

  • Figure 1.1: This figure describes the orthogonal foliation of $\mathbb{M}_d^N$ by the balanced varieties $\mathcal{M}_{\mathbf{G}}$ and fibers $\mathcal{F}_X$ in the simplest case ($d=1$ and $N=2$ and real matrices). The regularizing flow lies on the hyperbola $w_2w_1=x$. The learning flow lives on the asymptotes $w_2 = \pm w_1$. It is intuitively clear that the minimizers of $|w|^2$ on the fiber $w_2w_1=x$ are the points $(\pm \sqrt{x},\pm \sqrt{x})$. Theorem \ref{['thm:intro']} establishes the analogous property in general.
  • Figure 1.2: The regularizing flow (see Theorem \ref{['thm:reg-flow']}) on $\mathcal{F}_X$. When $d\geq 2$ the fiber $\mathcal{F}_X$ is sliced by the moments $\mathbf{G}$ into topologically equivalent components $\mathcal{F}_X \cap \mathcal{M}_\mathbf{G}$. The regularizing flow evolves the slices at a uniform exponential rate towards the minimizing orbit $\mathcal{O}_X$ corresponding to $\mathbf{G}=\mathbf{0}$.
  • Figure 1.3: The learning flow. There are two equivalent descriptions: the balanced manifold $\mathcal{M}$ is invariant under the gradient flow $\dot\mathbf{W} =-\nabla_\mathbf{W} E(X(\mathbf{W})$ of the cost function. Further, the dynamics of the end-to-end matrix $X$ are given by the Riemannian gradient flow $\dot{X}=-\mathrm{grad}_{g^N} E(X)$ on $(\mathbb{M}_d,g^N)$ where the manifold $(\mathbb{M}_d,g^N)$ is obtained by Riemannian submersion from $(\mathcal{M},\iota)$.

Theorems & Definitions (32)

  • Theorem 1
  • Theorem 2
  • Remark 3
  • Remark 4
  • Remark 5
  • Remark 6
  • Remark 7
  • Lemma 1
  • proof
  • Lemma 2
  • ...and 22 more