Table of Contents
Fetching ...

Nonnegative Matrix Factorization in the Component-Wise L1 Norm for Sparse Data

Giovanni Seraghiti, Kévin Dubrulle, Arnaud Vandaele, Nicolas Gillis

Abstract

Nonnegative matrix factorization (NMF) approximates a nonnegative matrix, $X$, by the product of two nonnegative factors, $WH$, where $W$ has $r$ columns and $H$ has $r$ rows. In this paper, we consider NMF using the component-wise L1 norm as the error measure (L1-NMF), which is suited for data corrupted by heavy-tailed noise, such as Laplace noise or salt and pepper noise, or in the presence of outliers. Our first contribution is an NP-hardness proof for L1-NMF, even when $r=1$, in contrast to the standard NMF that uses least squares. Our second contribution is to show that L1-NMF strongly enforces sparsity in the factors for sparse input matrices, thereby favoring interpretability. However, if the data is affected by false zeros, too sparse solutions might degrade the model. Our third contribution is a new, more general, L1-NMF model for sparse data, dubbed weighted L1-NMF (wL1-NMF), where the sparsity of the factorization is controlled by adding a penalization parameter to the entries of $WH$ associated with zeros in the data. The fourth contribution is a new coordinate descent (CD) approach for wL1-NMF, denoted as sparse CD (sCD), where each subproblem is solved by a weighted median algorithm. To the best of our knowledge, sCD is the first algorithm for L1-NMF whose complexity scales with the number of nonzero entries in the data, making it efficient in handling large-scale, sparse data. We perform extensive numerical experiments on synthetic and real-world data to show the effectiveness of our new proposed model (wL1-NMF) and algorithm (sCD).

Nonnegative Matrix Factorization in the Component-Wise L1 Norm for Sparse Data

Abstract

Nonnegative matrix factorization (NMF) approximates a nonnegative matrix, , by the product of two nonnegative factors, , where has columns and has rows. In this paper, we consider NMF using the component-wise L1 norm as the error measure (L1-NMF), which is suited for data corrupted by heavy-tailed noise, such as Laplace noise or salt and pepper noise, or in the presence of outliers. Our first contribution is an NP-hardness proof for L1-NMF, even when , in contrast to the standard NMF that uses least squares. Our second contribution is to show that L1-NMF strongly enforces sparsity in the factors for sparse input matrices, thereby favoring interpretability. However, if the data is affected by false zeros, too sparse solutions might degrade the model. Our third contribution is a new, more general, L1-NMF model for sparse data, dubbed weighted L1-NMF (wL1-NMF), where the sparsity of the factorization is controlled by adding a penalization parameter to the entries of associated with zeros in the data. The fourth contribution is a new coordinate descent (CD) approach for wL1-NMF, denoted as sparse CD (sCD), where each subproblem is solved by a weighted median algorithm. To the best of our knowledge, sCD is the first algorithm for L1-NMF whose complexity scales with the number of nonzero entries in the data, making it efficient in handling large-scale, sparse data. We perform extensive numerical experiments on synthetic and real-world data to show the effectiveness of our new proposed model (wL1-NMF) and algorithm (sCD).

Paper Structure

This paper contains 22 sections, 2 theorems, 47 equations, 5 figures, 4 tables, 3 algorithms.

Key Result

Theorem 1

\newlabellem:complexityyesyes0 The instance $(X,T)$ is a yes-instance of L1-NMF if and only if $(M,D)$ is a yes-instance of L1-LRMA. Hence rank-one L1-NMF is NP-hard.

Figures (5)

  • Figure 1: Example of a one-dimensional nonnegative least absolute deviation (LAD) problem in \ref{['eq:gen_scalar_prob']}.
  • Figure 1: Probability of the least squares solution (on the left) and LAD solution (on the right) in \ref{['eq:l1_l2_comp']} to be greater than zero when each component of the inputs is randomly sampled from a Bernoulli distribution in $\{0,1\}$ with probability $p$ to be 1.
  • Figure 1: Comparison of the robustness of various NMF models on the MNIST dataset depending on the noise level. Relative error w.r.t. ground truth. Noisy data reports $\lVert X- \bar{X} \rVert_F/\lVert X \rVert_F$.
  • Figure 2: Examples of the low-rank representations of digits using different NMF models on the MNIST dataset.
  • Figure 3: Relative error averaged over 10 runs between the noiseless data matrix and the wL1-NMFs for different values of $\lambda$. Each plot corresponds to different percentages $q_1$ of zeros, and each line represents a percentage $q_2$ of false zeros among the missing values.

Theorems & Definitions (4)

  • Theorem 1
  • Proof 1
  • Lemma 1
  • Proof 2