GSVD-NMF: Recovering Missing Features in Non-negative Matrix Factorization

Youdong Guo; Timothy E. Holy

GSVD-NMF: Recovering Missing Features in Non-negative Matrix Factorization

Youdong Guo, Timothy E. Holy

TL;DR

GSVD-NMF is introduced, a method that proposes new components based on the generalized singular value decomposition (GSVD) to address discrepancies between the initial under-complete NMF results and the SVD of the original matrix.

Abstract

Non-negative matrix factorization (NMF) is an important tool in signal processing and widely used to separate mixed sources into their components. Algorithms for NMF require that the user choose the number of components in advance, and if the results are unsatisfying one typically needs to start again with a different number of components. To make NMF more interactive and incremental, here we introduce GSVD-NMF, a method that proposes new components based on the generalized singular value decomposition (GSVD) to address discrepancies between the initial under-complete NMF results and the SVD of the original matrix. Simulation and experimental results demonstrate that GSVD-NMF often effectively recovers multiple missing components in under-complete NMF, with the recovered NMF solutions frequently reaching better local optima. The results further show that GSVD-NMF is compatible with various NMF algorithms and that directly augmenting components is more efficient than rerunning NMF from scratch with additional components. By deliberately starting from under-complete NMF, GSVD-NMF has the potential to be a recommended approach for a range of general NMF applications.

GSVD-NMF: Recovering Missing Features in Non-negative Matrix Factorization

TL;DR

Abstract

Paper Structure (18 sections, 16 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 18 sections, 16 equations, 8 figures, 3 tables, 1 algorithm.

Introduction
Methods
Results
Conclusion
Acknowledgements
Author contributions statement
Additional information
Legends

Figures (8)

Figure 1: The whole pipeline of GSVD-NMF.
Figure 2: Concept of GSVD-based feature recovery for NMF (2$\times$2 case). (a) Under multiplication by $\mathbf{X}$, points in the green disk map to points in the blue ellipse; the key elements of $\mathbf{X}$'s SVD are denoted. (b) An inexact NMF factorization maps to a different ellipse (tan). (c) GSVD-NMF suggests new directions ($\mathbf{y}$) to make the tan ellipse more like the blue ellipse.
Figure 3: A synthetic example used to illustrate the GSVD-NMF for feature recovery ($k=1$), displaying $\mathbf{W}$ and $\mathbf{H}$. (a) Ground truth $\mathbf{W}$ (each line depicting one column) and $\mathbf{H}$ (each line depicting one row) with 10 features. (b) $\mathbf{X}$ generated as $\mathbf{W}\mathbf{H}$ with added Gaussian noise (c) Standard NMF results (HALS) with 9 components. (d) The generalized singular value spectrum from equation (\ref{['gen_eigen_final']}). (e) Feature recovery results ($\mathbf{W}_g$, $\mathbf{H}_g$), with the new component in green. (f) Final NMF results ($\mathbf{W}_1$, $\mathbf{H}_1$). (g) Standard NMF results initialized with NNDSVD. Despite knowing the correct number of components, several features are incompletely separated, and the solution is much worse than panel (f). (h) Comparing the fitting error of standard NMF and GSVD-NMF with 1000 trials of adding random Gaussian noise to $\mathbf{W}\mathbf{H}$ (initialization using NNDSVD). The scatter plot compares the relative fitting errors of standard NMF and GSVD-NMF against the original matrix. Each magenta scatter point represents an individual comparison from a single random Gaussian noise added to $\mathbf{WH}$, with its position indicating the relative fitting error for standard NMF (vertical axis) and GSVD-NMF (horizontal axis). The brown line illustrates the histogram of the perpendicular distances from the scatter points to the diagonal, summarizing the overall distribution of error differences. For most tests, GSVD-NMF produces an equal or better fit.
Figure 4: The real-world data sets used for experiments. (a) LCMS1. (b) LCMS2. (c) The amplitude spectrogram of "Mary had a little lamb". (d) The amplitude spectrogram of "Prelude and Fugue No.1 in C major". The colorbar label represents the intensity at each pixel, normalized by the maximum intensity of the matrix. For the LCMS data, the intensity corresponds to the ion count. Higher values indicate a greater number of ions.
Figure 5: Comparing the fitting error of standard NMF and GSVD-NMF on real-world data. Each column corresponds to a different data set and/or number of components recovered by GSVD-NMF. (a) HALS. (b) GCD. (c) ALSGrad. (d) MU. Note that the axes for MU are expanded compared to the other three algorithms. Similar to the comparison shown in Fig. \ref{['simu_gsvd_k_1']}(h), the scatter plot compares the relative fitting errors of standard NMF and GSVD-NMF against the original matrix across different NMF algorithms. The key difference is that each magenta scatter point here represents an individual comparison from a single random initialization of NMF. The brown line illustrates the histogram of the perpendicular distances from the scatter points to the diagonal, summarizing the overall distribution of error differences.
...and 3 more figures

GSVD-NMF: Recovering Missing Features in Non-negative Matrix Factorization

TL;DR

Abstract

GSVD-NMF: Recovering Missing Features in Non-negative Matrix Factorization

Authors

TL;DR

Abstract

Table of Contents

Figures (8)