Table of Contents
Fetching ...

The geometry of the deep linear network

Govind Menon

TL;DR

An expository account of training dynamics in the Deep Linear Network (DLN) from the perspective of the geometric theory of dynamical systems and exact formulas for a Boltzmann entropy and stochastic gradient descent of free energy are provided.

Abstract

This article provides an expository account of training dynamics in the Deep Linear Network (DLN) from the perspective of the geometric theory of dynamical systems. Rigorous results by several authors are unified into a thermodynamic framework for deep learning. The analysis begins with a characterization of the invariant manifolds and Riemannian geometry in the DLN. This is followed by exact formulas for a Boltzmann entropy, as well as stochastic gradient descent of free energy using a Riemannian Langevin Equation. Several links between the DLN and other areas of mathematics are discussed, along with some open questions.

The geometry of the deep linear network

TL;DR

An expository account of training dynamics in the Deep Linear Network (DLN) from the perspective of the geometric theory of dynamical systems and exact formulas for a Boltzmann entropy and stochastic gradient descent of free energy are provided.

Abstract

This article provides an expository account of training dynamics in the Deep Linear Network (DLN) from the perspective of the geometric theory of dynamical systems. Rigorous results by several authors are unified into a thermodynamic framework for deep learning. The analysis begins with a characterization of the invariant manifolds and Riemannian geometry in the DLN. This is followed by exact formulas for a Boltzmann entropy, as well as stochastic gradient descent of free energy using a Riemannian Langevin Equation. Several links between the DLN and other areas of mathematics are discussed, along with some open questions.

Paper Structure

This paper contains 54 sections, 12 theorems, 153 equations, 5 figures.

Key Result

Theorem 1

The following hold on the maximal interval of existence of solutions to equation eq:ivp.

Figures (5)

  • Figure 9.1: The foliation of $\mathbb{M}_d^N$ by the balanced varieties $\mathcal{M}_{\mathbf{G}}$ may be visualized as a foliation by conic sections (see equation \ref{['eq:b6q']}).
  • Figure 9.2: In comparison with Figure \ref{['fig:geom1']}, we now blow up the balanced manifold $\mathcal{M}$ and visualize the foliation into group orbits $\mathcal{O}_W$ by slicing $\mathcal{M}$.
  • Figure 9.3: Motion by (minus one half) curvature arising from tangential Brownian fluctuations as discussed in Section \ref{['subsec:spheres']}.
  • Figure 9.4: Riemannian submersion $\phi:\mathcal{M} \to \mathfrak{M}_d$ and the dynamics upstairs and downstairs. In this image, we illustrate the RLE with stochastic dynamics upstairs and deterministic gradient descent of free energy downstairs. See equations \ref{['eq:rle-up2']} and \ref{['eq:rle-down2']}.
  • Figure 9.5: A rank-one variety within the zero energy set for matrix completion. See equations \ref{['eq:mc4']}--\ref{['eq:mc5']}.

Theorems & Definitions (23)

  • Theorem 1: Arora, Cohen, Hazan ACH
  • Theorem 2: Arora, Cohen, Hazan ACH
  • Theorem 3: Bah, Rauhut, Terstiege, Westdickenberg Bah2019
  • Remark 4
  • Remark 5: Thin gradients
  • Remark 6: Group orbits
  • Remark 7: Global existence and convergence
  • Remark 8: Relation to the Simons cone
  • Remark 9
  • Theorem 10: Menon, Yu MY-dln
  • ...and 13 more