Get rid of your constraints and reparametrize: A study in NNLS and implicit bias

Hung-Hsu Chou; Johannes Maly; Claudio Mayrink Verdun; Bernardo Freitas Paulo da Costa; Heudson Mirandola

Get rid of your constraints and reparametrize: A study in NNLS and implicit bias

Hung-Hsu Chou, Johannes Maly, Claudio Mayrink Verdun, Bernardo Freitas Paulo da Costa, Heudson Mirandola

TL;DR

The paper addresses solving non-negative least squares (NNLS) by reparameterizing the optimization via Hadamard powers, connecting overparameterized gradient dynamics to constrained optimization through a Riemannian perspective. It establishes global convergence of the reparameterized gradient flow to NNLS solutions with an $O(1/t)$ rate, and develops accelerated dynamics achieving $O(1/t^2)$, with discretized gradient methods attaining $O(1/k^{\gamma})$ under step-size decay. For sparse recovery, small initialization induces an implicit $\ell_1$-bias, and under standard NSP and $\mathcal{M}_+$ conditions, stable NNLS recovery is guaranteed with robustness to negative perturbations. Empirical results corroborate the theory, showing accelerated convergence and superior stability compared to classical NNLS solvers, and suggest that the reparametrization approach can generalize to other constrained problems, such as non-negative matrix factorization.

Abstract

Over the past years, there has been significant interest in understanding the implicit bias of gradient descent optimization and its connection to the generalization properties of overparametrized neural networks. Several works observed that when training linear diagonal networks on the square loss for regression tasks (which corresponds to overparametrized linear regression) gradient descent converges to special solutions, e.g., non-negative ones. We connect this observation to Riemannian optimization and view overparametrized GD with identical initialization as a Riemannian GD. We use this fact for solving non-negative least squares (NNLS), an important problem behind many techniques, e.g., non-negative matrix factorization. We show that gradient flow on the reparametrized objective converges globally to NNLS solutions, providing convergence rates also for its discretized counterpart. Unlike previous methods, we do not rely on the calculation of exponential maps or geodesics. We further show accelerated convergence using a second-order ODE, lending itself to accelerated descent methods. Finally, we establish the stability against negative perturbations and discuss generalization to other constrained optimization problems.

Get rid of your constraints and reparametrize: A study in NNLS and implicit bias

TL;DR

rate, and develops accelerated dynamics achieving

, with discretized gradient methods attaining

under step-size decay. For sparse recovery, small initialization induces an implicit

-bias, and under standard NSP and

conditions, stable NNLS recovery is guaranteed with robustness to negative perturbations. Empirical results corroborate the theory, showing accelerated convergence and superior stability compared to classical NNLS solvers, and suggest that the reparametrization approach can generalize to other constrained problems, such as non-negative matrix factorization.

Abstract

Paper Structure (32 sections, 16 theorems, 123 equations, 11 figures)

This paper contains 32 sections, 16 theorems, 123 equations, 11 figures.

INTRODUCTION
Contribution and Outline
Notation
THEORETICAL RESULTS
Convergence rate of gradient flow
Convergence rate of accelerated reparametrized flow
Convergence rate of gradient descent
NNLS for sparse recovery
NUMERICAL EXPERIMENTS
Different stepsizes
Acceleration
Stepsize decay
Initialization and Number of Layers
Stability with Negative Entries
CONCLUSION
...and 17 more sections

Key Result

Theorem 2.1

Let $L\geq 2$, ${\bf A}\in\mathbb{R}^{M\times N}$ and ${\bf b}\in\mathbb{R}^{M}$. Let ${\bf x}_0 > \boldsymbol{0}$ be fixed and let ${\bf x}(t)$ follow the flow ${\bf x}'(t) = -\nabla \mathcal{L}({\bf x}(t))$ with ${\bf x}(0) = {\bf x}_0$. Let $S_+$ be the set defined in eq:NNLS and let $\tilde{\bf for any $t > 0$ and any ${\bf x}_+ \in S_+$.

Figures (11)

Figure 1: Convergence rate for various choices of step-size, see Section \ref{['sec:NumericsStepsize']}.
Figure 2: Accelerated gradient methods: (a) Numerical solution of ODE \ref{['accelerated-gd-continuous']} and (b) behavior of discretized accelerated gradient.
Figure 3: Impact of decaying stepsize rate $\gamma$, for (a) $L=2$ and (b) $L=3$.
Figure 4: Influence of initialization and number of layers on GD-$n$L, cf. Section \ref{['sec:NumericsInit']}.
Figure 5: Illustration of the MNIST reconstruction, see Section \ref{['sec:NumericsStability']}.
...and 6 more figures

Theorems & Definitions (40)

Remark 1.1
Theorem 2.1
proof : Proof sketch
Remark 2.2
Remark 2.3
Theorem 2.4
proof : Proof sketch
Remark 2.5
Theorem 2.6
proof : Proof sketch
...and 30 more

Get rid of your constraints and reparametrize: A study in NNLS and implicit bias

TL;DR

Abstract

Get rid of your constraints and reparametrize: A study in NNLS and implicit bias

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (40)