Table of Contents
Fetching ...

An extrapolated and provably convergent algorithm for nonlinear matrix decomposition with the ReLU function

Nicolas Gillis, Margherita Porcelli, Giovanni Seraghiti

TL;DR

This work addresses nonlinear matrix decomposition with ReLU activations by studying ReLU-NMD and its two main variants, Latent-ReLU-NMD and 3B-ReLU-NMD, and by developing convergent algorithms. It proves that Latent-ReLU-NMD can be ill-posed while ReLU-NMD may have a well-defined set of low-rank solutions, and it derives a bound connecting the two formulations. The authors establish convergence guarantees for a block-coordinate-descent scheme (BCD-NMD) applied to 3B-ReLU-NMD and introduce an extrapolated variant (eBCD-NMD) with subsequence convergence to KKT points, inspired by LMaFit. Extensive numerical experiments across matrix completion with ReLU sampling, EDMC, low-dimensional embedding, and sparse-data compression demonstrate that eBCD-NMD accelerates convergence and achieves competitive, often superior, performance relative to state-of-the-art methods while preserving theoretical guarantees.

Abstract

Nonlinear matrix decomposition (NMD) with the ReLU function, denoted ReLU-NMD, is the following problem: given a sparse, nonnegative matrix $X$ and a factorization rank $r$, identify a rank-$r$ matrix $Θ$ such that $X\approx \max(0,Θ)$. This decomposition finds application in data compression, matrix completion with entries missing not at random, and manifold learning. The standard ReLU-NMD model minimizes the least squares error, that is, $\|X - \max(0,Θ)\|_F^2$. The corresponding optimization problem is nondifferentiable and highly nonconvex. This motivated Saul to propose an alternative model, Latent-ReLU-NMD, where a latent variable $Z$ is introduced and satisfies $\max(0,Z)=X$ while minimizing $\|Z - Θ\|_F^2$ (``A nonlinear matrix decomposition for mining the zeros of sparse data'', SIAM J. Math. Data Sci., 2022). Our first contribution is to show that the two formulations may yield different low-rank solutions $Θ$; in particular, we show that Latent-ReLU-NMD can be ill-posed when ReLU-NMD is not, meaning that there are instances in which the infimum of Latent-ReLU-NMD is not attained while that of ReLU-NMD is. We also consider another alternative model, called 3B-ReLU-NMD, which parameterizes $Θ=WH$, where $W$ has $r$ columns and $H$ has $r$ rows, allowing one to get rid of the rank constraint in Latent-ReLU-NMD. Our second contribution is to prove the convergence of a block coordinate descent (BCD) applied to 3B-ReLU-NMD and referred to as BCD-NMD. Our third contribution is a novel extrapolated variant of BCD-NMD, dubbed eBCD-NMD, which we prove is also convergent under mild assumptions. We illustrate the significant acceleration effect of eBCD-NMD compared to BCD-NMD, and also show that eBCD-NMD performs well against the state of the art on synthetic and real-world data sets.

An extrapolated and provably convergent algorithm for nonlinear matrix decomposition with the ReLU function

TL;DR

This work addresses nonlinear matrix decomposition with ReLU activations by studying ReLU-NMD and its two main variants, Latent-ReLU-NMD and 3B-ReLU-NMD, and by developing convergent algorithms. It proves that Latent-ReLU-NMD can be ill-posed while ReLU-NMD may have a well-defined set of low-rank solutions, and it derives a bound connecting the two formulations. The authors establish convergence guarantees for a block-coordinate-descent scheme (BCD-NMD) applied to 3B-ReLU-NMD and introduce an extrapolated variant (eBCD-NMD) with subsequence convergence to KKT points, inspired by LMaFit. Extensive numerical experiments across matrix completion with ReLU sampling, EDMC, low-dimensional embedding, and sparse-data compression demonstrate that eBCD-NMD accelerates convergence and achieves competitive, often superior, performance relative to state-of-the-art methods while preserving theoretical guarantees.

Abstract

Nonlinear matrix decomposition (NMD) with the ReLU function, denoted ReLU-NMD, is the following problem: given a sparse, nonnegative matrix and a factorization rank , identify a rank- matrix such that . This decomposition finds application in data compression, matrix completion with entries missing not at random, and manifold learning. The standard ReLU-NMD model minimizes the least squares error, that is, . The corresponding optimization problem is nondifferentiable and highly nonconvex. This motivated Saul to propose an alternative model, Latent-ReLU-NMD, where a latent variable is introduced and satisfies while minimizing (``A nonlinear matrix decomposition for mining the zeros of sparse data'', SIAM J. Math. Data Sci., 2022). Our first contribution is to show that the two formulations may yield different low-rank solutions ; in particular, we show that Latent-ReLU-NMD can be ill-posed when ReLU-NMD is not, meaning that there are instances in which the infimum of Latent-ReLU-NMD is not attained while that of ReLU-NMD is. We also consider another alternative model, called 3B-ReLU-NMD, which parameterizes , where has columns and has rows, allowing one to get rid of the rank constraint in Latent-ReLU-NMD. Our second contribution is to prove the convergence of a block coordinate descent (BCD) applied to 3B-ReLU-NMD and referred to as BCD-NMD. Our third contribution is a novel extrapolated variant of BCD-NMD, dubbed eBCD-NMD, which we prove is also convergent under mild assumptions. We illustrate the significant acceleration effect of eBCD-NMD compared to BCD-NMD, and also show that eBCD-NMD performs well against the state of the art on synthetic and real-world data sets.

Paper Structure

This paper contains 30 sections, 11 theorems, 77 equations, 5 figures, 2 tables, 2 algorithms.

Key Result

Lemma 1

\newlabelth:ill_posed_lat0 Let $0 < \epsilon < 1 / \sqrt{2}$ be a fixed parameter, $r=1$, and Then the infimum of Latent-ReLU-NMD in (eq:lat_nmd) is not attained.

Figures (5)

  • Figure 1: One hidden-layer neural network with 5 input and output nodes and 3 hidden nodes.
  • Figure 1: Matrix completion with ReLU sampling: evolution of the average relative residual $\Gamma_k$ w.r.t. CPU time for the noiseless case (left image) and the noisy case (right image).
  • Figure 2: Euclidean distance matrix completion. Distribution of points on the left and relative error with the Euclidean distance matrix of the points in the right. The $x$-axis represents the percentage of known entries.
  • Figure 3: MAD analysis and average iteration time for increasing values of the rank in the $x$-axes. k1b dataset on top and hitech dataset at the bottom.
  • Figure 4: Comparison between $50 \%$ compressed images using TSVD and eBCD on MNIST and Phantom data sets.

Theorems & Definitions (20)

  • Lemma 1
  • Proof 1
  • Lemma 2
  • Proof 2
  • Corollary 3
  • Theorem 4
  • Proof 3
  • Theorem 1
  • Proof 4
  • Lemma 2
  • ...and 10 more