Table of Contents
Fetching ...

Learning nonnegative matrix factorizations from compressed data

Abraar Chaudhry, Elizaveta Rebrova

Abstract

We propose a flexible and theoretically supported framework for scalable nonnegative matrix factorization. The goal is to find nonnegative low-rank components directly from compressed measurements, accessing the original data only once or twice. We consider compression through randomized sketching methods that can be adapted to the data, or can be oblivious. We formulate optimization problems that only depend on the compressed data, but which can recover a nonnegative factorization which closely approximates the original matrix. The defined problems can be approached with a variety of algorithms, and in particular, we discuss variations of the popular multiplicative updates method for these compressed problems. We demonstrate the success of our approaches empirically and validate their performance in real-world applications.

Learning nonnegative matrix factorizations from compressed data

Abstract

We propose a flexible and theoretically supported framework for scalable nonnegative matrix factorization. The goal is to find nonnegative low-rank components directly from compressed measurements, accessing the original data only once or twice. We consider compression through randomized sketching methods that can be adapted to the data, or can be oblivious. We formulate optimization problems that only depend on the compressed data, but which can recover a nonnegative factorization which closely approximates the original matrix. The defined problems can be approached with a variety of algorithms, and in particular, we discuss variations of the popular multiplicative updates method for these compressed problems. We demonstrate the success of our approaches empirically and validate their performance in real-world applications.
Paper Structure (21 sections, 12 theorems, 77 equations, 5 figures, 3 tables)

This paper contains 21 sections, 12 theorems, 77 equations, 5 figures, 3 tables.

Key Result

Theorem 1

Suppose ${\mathbf{X}}$ has an exact nonnegative factorization ${\mathbf{X}} = {\mathbf{U}}_0{\mathbf{V}}_0^T$, where ${\mathbf{U}}_0 \in {\mathbb{R}}_+^{m \times r}$, ${\mathbf{V}}_0 \in {\mathbb{R}}_+^{n \times r}$with$r \leq \min\{n,m\}$. Let ${\mathbf{A}}_1$ and ${\mathbf{A}}_2$ be matrices of si where $({\mathbf{U}}, {\mathbf{V}} \ge 0)$ means $({\mathbf{U}} \in {\mathbb{R}}_+^{m \times r}, {\

Figures (5)

  • Figure 1: NMF recovery of the synthetic data with MU from full data (Uncompressed); from compressed data with one-sided data-adapted sketches using only $4\%$ of original memory (One-sided), and with two-sided sketches using $8\%$ of original memory (Two-sided). In both cases, i.i.d. Gaussian sketching matrices are used.
  • Figure 2: NMF recovery of the synthetic data with MU. Displays (c,d) show that MU on data-adapted and random two-sided sketched data also tend to the limiting similarity $1.0$. Across all methods, less compression (larger $k$) improves convergence.
  • Figure 3: Effect of regularization parameter $\lambda$ on the MU algorithm \ref{['eq:one-sided-mu-updates']}. 20News dataset compressed with data-adapted one-sided measurements, $\sigma$ is chosen minimal so that we have ${\mathbf{A}}{\mathbf{A}}^T \geq -\sigma$. The absence of regularization compromises convergence and too strong regularization results in a higher loss.
  • Figure 4: (a,c): recovery from random Gaussian measurements, averaged over $5$ runs; our two-sided MU methods lead to better convergence than WL wang2010efficient. (b,d): data-adapted methods with sketching matrices obtained with the randomized rangefinder algorithm (like in Corollary \ref{['cor:exact-one-sided']}); our two-sided MU perform slightly better than TS tepper2016compressed after enough iterations.
  • Figure 5: Six "representative" faces from the Faces dataset learned from the compressed dataset of the size $\sim 5\%$ of initial data. Data-adapted compression matrix ${\mathbf{A}}$ is used.

Theorems & Definitions (29)

  • Theorem 1
  • Remark 1: Implementation considerations
  • proof
  • Remark 2
  • Theorem 2
  • proof : Proof of Theorem \ref{['thm:oneside_penalty']}
  • Theorem 3: "Randomized rangefinder algorithm loss", halko2011finding
  • Corollary 1: Data-adapted one-sided sketches
  • proof
  • Remark 3
  • ...and 19 more