On the Benefits of Weight Normalization for Overparameterized Matrix Sensing

Yudong Wei; Liang Zhang; Bingcong Li; Niao He

On the Benefits of Weight Normalization for Overparameterized Matrix Sensing

Yudong Wei, Liang Zhang, Bingcong Li, Niao He

TL;DR

The paper addresses recovering a low-rank PSD matrix $\mathbf{A}$ from linear measurements by applying generalized weight normalization (WN) to a matrix factorization via polar decomposition. It develops a Riemannian optimization scheme (RGd) on the Stiefel manifold for the direction $\mathbf{X}$ and gradient-based updates for the magnitude $\mathbf{\Theta}$, leading to a two-phase convergence: a saddle-escape phase followed by linear convergence. The main contributions are (i) an exponential improvement in convergence rate over standard gradient methods, (ii) polynomial improvements in iteration and sample complexity with higher overparameterization, and (iii) extensive numerical validation on synthetic and real data, including image reconstruction. The results provide theoretical and empirical evidence that overparameterization, when combined with weight normalization, can be leveraged to accelerate nonconvex matrix sensing and potentially other learning problems.

Abstract

While normalization techniques are widely used in deep learning, their theoretical understanding remains relatively limited. In this work, we establish the benefits of (generalized) weight normalization (WN) applied to the overparameterized matrix sensing problem. We prove that WN with Riemannian optimization achieves linear convergence, yielding an exponential speedup over standard methods that do not use WN. Our analysis further demonstrates that both iteration and sample complexity improve polynomially as the level of overparameterization increases. To the best of our knowledge, this work provides the first characterization of how WN leverages overparameterization for faster convergence in matrix sensing.

On the Benefits of Weight Normalization for Overparameterized Matrix Sensing

TL;DR

The paper addresses recovering a low-rank PSD matrix

from linear measurements by applying generalized weight normalization (WN) to a matrix factorization via polar decomposition. It develops a Riemannian optimization scheme (RGd) on the Stiefel manifold for the direction

and gradient-based updates for the magnitude

, leading to a two-phase convergence: a saddle-escape phase followed by linear convergence. The main contributions are (i) an exponential improvement in convergence rate over standard gradient methods, (ii) polynomial improvements in iteration and sample complexity with higher overparameterization, and (iii) extensive numerical validation on synthetic and real data, including image reconstruction. The results provide theoretical and empirical evidence that overparameterization, when combined with weight normalization, can be leveraged to accelerate nonconvex matrix sensing and potentially other learning problems.

On the Benefits of Weight Normalization for Overparameterized Matrix Sensing

TL;DR

Abstract

On the Benefits of Weight Normalization for Overparameterized Matrix Sensing

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (28)