Get rid of your constraints and reparametrize: A study in NNLS and implicit bias
Hung-Hsu Chou, Johannes Maly, Claudio Mayrink Verdun, Bernardo Freitas Paulo da Costa, Heudson Mirandola
TL;DR
The paper addresses solving non-negative least squares (NNLS) by reparameterizing the optimization via Hadamard powers, connecting overparameterized gradient dynamics to constrained optimization through a Riemannian perspective. It establishes global convergence of the reparameterized gradient flow to NNLS solutions with an $O(1/t)$ rate, and develops accelerated dynamics achieving $O(1/t^2)$, with discretized gradient methods attaining $O(1/k^{\gamma})$ under step-size decay. For sparse recovery, small initialization induces an implicit $\ell_1$-bias, and under standard NSP and $\mathcal{M}_+$ conditions, stable NNLS recovery is guaranteed with robustness to negative perturbations. Empirical results corroborate the theory, showing accelerated convergence and superior stability compared to classical NNLS solvers, and suggest that the reparametrization approach can generalize to other constrained problems, such as non-negative matrix factorization.
Abstract
Over the past years, there has been significant interest in understanding the implicit bias of gradient descent optimization and its connection to the generalization properties of overparametrized neural networks. Several works observed that when training linear diagonal networks on the square loss for regression tasks (which corresponds to overparametrized linear regression) gradient descent converges to special solutions, e.g., non-negative ones. We connect this observation to Riemannian optimization and view overparametrized GD with identical initialization as a Riemannian GD. We use this fact for solving non-negative least squares (NNLS), an important problem behind many techniques, e.g., non-negative matrix factorization. We show that gradient flow on the reparametrized objective converges globally to NNLS solutions, providing convergence rates also for its discretized counterpart. Unlike previous methods, we do not rely on the calculation of exponential maps or geodesics. We further show accelerated convergence using a second-order ODE, lending itself to accelerated descent methods. Finally, we establish the stability against negative perturbations and discuss generalization to other constrained optimization problems.
