Learning nonnegative matrix factorizations from compressed data

Abraar Chaudhry; Elizaveta Rebrova

Learning nonnegative matrix factorizations from compressed data

Abraar Chaudhry, Elizaveta Rebrova

Abstract

We propose a flexible and theoretically supported framework for scalable nonnegative matrix factorization. The goal is to find nonnegative low-rank components directly from compressed measurements, accessing the original data only once or twice. We consider compression through randomized sketching methods that can be adapted to the data, or can be oblivious. We formulate optimization problems that only depend on the compressed data, but which can recover a nonnegative factorization which closely approximates the original matrix. The defined problems can be approached with a variety of algorithms, and in particular, we discuss variations of the popular multiplicative updates method for these compressed problems. We demonstrate the success of our approaches empirically and validate their performance in real-world applications.

Learning nonnegative matrix factorizations from compressed data

Abstract

Paper Structure (21 sections, 12 theorems, 77 equations, 5 figures, 3 tables)

This paper contains 21 sections, 12 theorems, 77 equations, 5 figures, 3 tables.

Introduction
Contributions and outline
Related work on scalable NMF
Notation
Compressed problems with reliable solutions
Two-sided compression
One-sided compression: orthogonal sketching matrices
One-sided compression: nonorthogonal sketching matrices
Nonnegativity in compression
Methods that solve compressed problems
General convergence for sketched multiplicative updates
MU for solving regularized compressed problems
Solving compressed problems with projected gradient descent
Experiments
Exact recovery from compressed measurements is achievable
...and 6 more sections

Key Result

Theorem 1

Suppose ${\mathbf{X}}$ has an exact nonnegative factorization ${\mathbf{X}} = {\mathbf{U}}_0{\mathbf{V}}_0^T$, where ${\mathbf{U}}_0 \in {\mathbb{R}}_+^{m \times r}$, ${\mathbf{V}}_0 \in {\mathbb{R}}_+^{n \times r}$with$r \leq \min\{n,m\}$. Let ${\mathbf{A}}_1$ and ${\mathbf{A}}_2$ be matrices of si where $({\mathbf{U}}, {\mathbf{V}} \ge 0)$ means $({\mathbf{U}} \in {\mathbb{R}}_+^{m \times r}, {\

Figures (5)

Figure 1: NMF recovery of the synthetic data with MU from full data (Uncompressed); from compressed data with one-sided data-adapted sketches using only $4\%$ of original memory (One-sided), and with two-sided sketches using $8\%$ of original memory (Two-sided). In both cases, i.i.d. Gaussian sketching matrices are used.
Figure 2: NMF recovery of the synthetic data with MU. Displays (c,d) show that MU on data-adapted and random two-sided sketched data also tend to the limiting similarity $1.0$. Across all methods, less compression (larger $k$) improves convergence.
Figure 3: Effect of regularization parameter $\lambda$ on the MU algorithm \ref{['eq:one-sided-mu-updates']}. 20News dataset compressed with data-adapted one-sided measurements, $\sigma$ is chosen minimal so that we have ${\mathbf{A}}{\mathbf{A}}^T \geq -\sigma$. The absence of regularization compromises convergence and too strong regularization results in a higher loss.
Figure 4: (a,c): recovery from random Gaussian measurements, averaged over $5$ runs; our two-sided MU methods lead to better convergence than WL wang2010efficient. (b,d): data-adapted methods with sketching matrices obtained with the randomized rangefinder algorithm (like in Corollary \ref{['cor:exact-one-sided']}); our two-sided MU perform slightly better than TS tepper2016compressed after enough iterations.
Figure 5: Six "representative" faces from the Faces dataset learned from the compressed dataset of the size $\sim 5\%$ of initial data. Data-adapted compression matrix ${\mathbf{A}}$ is used.

Theorems & Definitions (29)

Theorem 1
Remark 1: Implementation considerations
proof
Remark 2
Theorem 2
proof : Proof of Theorem \ref{['thm:oneside_penalty']}
Theorem 3: "Randomized rangefinder algorithm loss", halko2011finding
Corollary 1: Data-adapted one-sided sketches
proof
Remark 3
...and 19 more

Learning nonnegative matrix factorizations from compressed data

Abstract

Learning nonnegative matrix factorizations from compressed data

Authors

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (29)