An entropy formula for the Deep Linear Network

Govind Menon; Tianmin Yu

An entropy formula for the Deep Linear Network

Govind Menon, Tianmin Yu

TL;DR

The paper develops a rigorous geometric and thermodynamic framework for the deep linear network (DLN) by exploiting group actions and Riemannian submersion to relate parameter-space geometry to observable end-to-end behavior. It derives an explicit Boltzmann entropy S(X) for the DLN, expressed in terms of the singular values of the end-to-end matrix X and Vandermonde determinants, and connects this to a Riemannian Langevin dynamics on the gauge group in companion work. A central technical advance is constructing an explicit orthonormal basis for the tangent space of the balanced manifold using Jacobi matrices, enabling precise pullback metric calculations and the submersion theorem that explains the metric structure on observables. These results illuminate how depth and overparametrization shape training dynamics, implicit bias, and entropic regularization, bridging random matrix theory, geometric control, and learning theory with potential impact on stochastic gradient flow and regularization strategies in deep learning.

Abstract

We study the Riemannian geometry of the Deep Linear Network (DLN) as a foundation for a thermodynamic description of the learning process. The main tools are the use of group actions to analyze overparametrization and the use of Riemannian submersion from the space of parameters to the space of observables. The foliation of the balanced manifold in the parameter space by group orbits is used to define and compute a Boltzmann entropy. We also show that the Riemannian geometry on the space of observables defined in [2] is obtained by Riemannian submersion of the balanced manifold. The main technical step is an explicit construction of an orthonormal basis for the tangent space of the balanced manifold using the theory of Jacobi matrices.

An entropy formula for the Deep Linear Network

TL;DR

Abstract

An entropy formula for the Deep Linear Network

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (45)