Table of Contents
Fetching ...

Entropic Regularization in the Deep Linear Network

Alan Chen, Tejas Kotwal, Govind Menon

TL;DR

The paper advances a rigorous framework for entropic regularization of the deep linear network by introducing the Boltzmann entropy S_N into a free-energy Fβ and analyzing the resulting gradient flow on the DLN metric. For spectral energies, it proves a unique equilibrium consisting of an O_d-orbit in end-to-end maps, with explicit relaxation rates derived from a linearization that separates uniform scaling from relative singular-value adjustments. It reveals a sharp contrast between Euclidean and Riemannian geometries: S_N is strictly concave in Euclidean space but not under the DLN metric, and even at equal singular values the Riemannian Hessian is indefinite, highlighting subtle regularization effects. The authors also provide an exact solution to the gradient flow on the diagonal (equal-SV) manifold and discuss infinite-depth and mean-field implications, laying groundwork for extensions to non-symmetric losses such as matrix completion. Overall, the work connects entropic regularization, Riemannian geometry, and random-matrix–inspired dynamics to yield tractable benchmarks and new insights into implicit regularization in DLNs.

Abstract

We study regularization for the deep linear network (DLN) using the entropy formula introduced in arXiv:2509.09088. The equilibria and gradient flow of the free energy on the Riemannian manifold of end-to-end maps of the DLN are characterized for energies that depend symmetrically on the singular values of the end-to-end matrix. The only equilibria are minimizers and the set of minimizers is an orbit of the orthogonal group. In contrast with random matrix theory there is no singular value repulsion. The corresponding gradient flow reduces to a one-dimensional ordinary differential equation whose solution gives explicit relaxation rates toward the minimizers. We also study the concavity of the entropy in the chamber of singular values. The entropy is shown to be strictly concave in the Euclidean geometry on the chamber but not in the Riemannian geometry defined by the DLN metric.

Entropic Regularization in the Deep Linear Network

TL;DR

The paper advances a rigorous framework for entropic regularization of the deep linear network by introducing the Boltzmann entropy S_N into a free-energy Fβ and analyzing the resulting gradient flow on the DLN metric. For spectral energies, it proves a unique equilibrium consisting of an O_d-orbit in end-to-end maps, with explicit relaxation rates derived from a linearization that separates uniform scaling from relative singular-value adjustments. It reveals a sharp contrast between Euclidean and Riemannian geometries: S_N is strictly concave in Euclidean space but not under the DLN metric, and even at equal singular values the Riemannian Hessian is indefinite, highlighting subtle regularization effects. The authors also provide an exact solution to the gradient flow on the diagonal (equal-SV) manifold and discuss infinite-depth and mean-field implications, laying groundwork for extensions to non-symmetric losses such as matrix completion. Overall, the work connects entropic regularization, Riemannian geometry, and random-matrix–inspired dynamics to yield tractable benchmarks and new insights into implicit regularization in DLNs.

Abstract

We study regularization for the deep linear network (DLN) using the entropy formula introduced in arXiv:2509.09088. The equilibria and gradient flow of the free energy on the Riemannian manifold of end-to-end maps of the DLN are characterized for energies that depend symmetrically on the singular values of the end-to-end matrix. The only equilibria are minimizers and the set of minimizers is an orbit of the orthogonal group. In contrast with random matrix theory there is no singular value repulsion. The corresponding gradient flow reduces to a one-dimensional ordinary differential equation whose solution gives explicit relaxation rates toward the minimizers. We also study the concavity of the entropy in the chamber of singular values. The entropy is shown to be strictly concave in the Euclidean geometry on the chamber but not in the Riemannian geometry defined by the DLN metric.

Paper Structure

This paper contains 40 sections, 29 theorems, 174 equations, 1 figure.

Key Result

Theorem 1.2

There exists a unique equilibrium $\sigma\in\mathcal{S}_d$ of $F_\beta$, and it has the form where $\sigma_\star$ is the unique solution of Moreover, this equilibrium is a minimizer of $F_\beta$ on $\mathcal{S}_d$.

Figures (1)

  • Figure 1: Phase portraits of the gradient flow $\dot{\sigma}=-\mathop{\mathrm{grad}}\nolimits_{g^N}F_\beta$, using the Schatten--$p$ energy $E(\sigma)=\tfrac{1}{p}\sum_i\sigma_i^p$, for $(N,p,\beta)=(10,2,5)$.

Theorems & Definitions (62)

  • Definition 1.1
  • Theorem 1.2
  • Theorem 1.3
  • Remark 1.4: Infinite depth
  • Corollary 1.5
  • Corollary 1.6
  • Theorem 1.7: Exact solution on $\mathcal{D}$
  • Theorem 1.8
  • Theorem 1.9
  • Theorem 1.10
  • ...and 52 more