Table of Contents
Fetching ...

On the Loss Landscape Geometry of Regularized Deep Matrix Factorization: Uniqueness and Sharpness

Anil Kamber, Rahul Parhi

Abstract

Weight decay is ubiquitous in training deep neural network architectures. Its empirical success is often attributed to capacity control; nonetheless, our theoretical understanding of its effect on the loss landscape and the set of minimizers remains limited. In this paper, we show that $\ell^2$-regularized deep matrix factorization/deep linear network training problems with squared-error loss admit a unique end-to-end minimizer for all target matrices subject to factorization, except for a set of Lebesgue measure zero formed by the depth and the regularization parameter. This observation reveals fundamental properties of the loss landscape of regularized deep matrix factorization problems: the Hessian spectrum is constant across all minimizers of the regularized deep scalar factorization problem with squared-error loss. Moreover, we show that, in regularized deep matrix factorization problems with squared-error loss, if the target matrix does not belong to the Lebesgue measure-zero set, then the Frobenius norm of each layer is constant across all minimizers. This, in turn, yields a global lower bound on the trace of the Hessian evaluated at any minimizer of the regularized deep matrix factorization problem. Furthermore, we establish a critical threshold for the regularization parameter above which the unique end-to-end minimizer collapses to zero.

On the Loss Landscape Geometry of Regularized Deep Matrix Factorization: Uniqueness and Sharpness

Abstract

Weight decay is ubiquitous in training deep neural network architectures. Its empirical success is often attributed to capacity control; nonetheless, our theoretical understanding of its effect on the loss landscape and the set of minimizers remains limited. In this paper, we show that -regularized deep matrix factorization/deep linear network training problems with squared-error loss admit a unique end-to-end minimizer for all target matrices subject to factorization, except for a set of Lebesgue measure zero formed by the depth and the regularization parameter. This observation reveals fundamental properties of the loss landscape of regularized deep matrix factorization problems: the Hessian spectrum is constant across all minimizers of the regularized deep scalar factorization problem with squared-error loss. Moreover, we show that, in regularized deep matrix factorization problems with squared-error loss, if the target matrix does not belong to the Lebesgue measure-zero set, then the Frobenius norm of each layer is constant across all minimizers. This, in turn, yields a global lower bound on the trace of the Hessian evaluated at any minimizer of the regularized deep matrix factorization problem. Furthermore, we establish a critical threshold for the regularization parameter above which the unique end-to-end minimizer collapses to zero.

Paper Structure

This paper contains 19 sections, 20 theorems, 140 equations, 3 figures.

Key Result

Theorem 2

For any $\mathbf{w}^* \in \Omega$, layers (factors) are balanced, i.e., which implies that layers possess exactly the same singular values. Furthermore, if the singular values are distinct, then their left and right singular vectors align, up to an incurable sign ambiguity.

Figures (3)

  • Figure 1: Behavior of the data-fitting term $D(\rho)$, the regularization term $R(\rho)$, and the optimization objective $\phi(\rho)$ for a depth-$3$ factorization of $3$ with $\lambda=4$.
  • Figure 2: Behavior of $\phi(\rho)$ for a depth-$5$ factorization of $-3$ under different regularization parameters. Threshold is computed as $\tau = \left(|m|/\left (1-\frac{q}{2} \right)\left(1-q\right)^{({q-1})/({2-q})}\right)^{(2-q)}$, where $q := 2/L$.
  • Figure 3: Frobenius norm of the converged point obtained after $T=5000$ steps of GD with regularization parameter $\lambda = \alpha \tau$ applied to \ref{['mainobjectivefunction']}, where $\tau = \left(\norm{\sigma(\mathbf{M}^\natural)}_{\infty} /\left (1-\frac{q}{2} \right)\left(1-q\right)^{({q-1})/({2-q})}\right)^{(2-q)}$.

Theorems & Definitions (52)

  • Definition 1
  • Theorem 2: chen2025complete
  • Theorem 3
  • proof
  • Remark 4
  • Theorem 5
  • proof
  • Theorem 6
  • proof
  • Remark 7
  • ...and 42 more