An extrapolated and provably convergent algorithm for nonlinear matrix decomposition with the ReLU function
Nicolas Gillis, Margherita Porcelli, Giovanni Seraghiti
TL;DR
This work addresses nonlinear matrix decomposition with ReLU activations by studying ReLU-NMD and its two main variants, Latent-ReLU-NMD and 3B-ReLU-NMD, and by developing convergent algorithms. It proves that Latent-ReLU-NMD can be ill-posed while ReLU-NMD may have a well-defined set of low-rank solutions, and it derives a bound connecting the two formulations. The authors establish convergence guarantees for a block-coordinate-descent scheme (BCD-NMD) applied to 3B-ReLU-NMD and introduce an extrapolated variant (eBCD-NMD) with subsequence convergence to KKT points, inspired by LMaFit. Extensive numerical experiments across matrix completion with ReLU sampling, EDMC, low-dimensional embedding, and sparse-data compression demonstrate that eBCD-NMD accelerates convergence and achieves competitive, often superior, performance relative to state-of-the-art methods while preserving theoretical guarantees.
Abstract
Nonlinear matrix decomposition (NMD) with the ReLU function, denoted ReLU-NMD, is the following problem: given a sparse, nonnegative matrix $X$ and a factorization rank $r$, identify a rank-$r$ matrix $Θ$ such that $X\approx \max(0,Θ)$. This decomposition finds application in data compression, matrix completion with entries missing not at random, and manifold learning. The standard ReLU-NMD model minimizes the least squares error, that is, $\|X - \max(0,Θ)\|_F^2$. The corresponding optimization problem is nondifferentiable and highly nonconvex. This motivated Saul to propose an alternative model, Latent-ReLU-NMD, where a latent variable $Z$ is introduced and satisfies $\max(0,Z)=X$ while minimizing $\|Z - Θ\|_F^2$ (``A nonlinear matrix decomposition for mining the zeros of sparse data'', SIAM J. Math. Data Sci., 2022). Our first contribution is to show that the two formulations may yield different low-rank solutions $Θ$; in particular, we show that Latent-ReLU-NMD can be ill-posed when ReLU-NMD is not, meaning that there are instances in which the infimum of Latent-ReLU-NMD is not attained while that of ReLU-NMD is. We also consider another alternative model, called 3B-ReLU-NMD, which parameterizes $Θ=WH$, where $W$ has $r$ columns and $H$ has $r$ rows, allowing one to get rid of the rank constraint in Latent-ReLU-NMD. Our second contribution is to prove the convergence of a block coordinate descent (BCD) applied to 3B-ReLU-NMD and referred to as BCD-NMD. Our third contribution is a novel extrapolated variant of BCD-NMD, dubbed eBCD-NMD, which we prove is also convergent under mild assumptions. We illustrate the significant acceleration effect of eBCD-NMD compared to BCD-NMD, and also show that eBCD-NMD performs well against the state of the art on synthetic and real-world data sets.
