Table of Contents
Fetching ...

Algorithms for Non-Negative Matrix Factorization on Noisy Data With Negative Values

Dylan Green, Stephen Bailey

TL;DR

Two algorithms are presented that use the negative data space without clipping or masking and recover non-negative signals without any introduced positive offset that occurs when clipping or masking negative data.

Abstract

Non-negative matrix factorization (NMF) is a dimensionality reduction technique that has shown promise for analyzing noisy data, especially astronomical data. For these datasets, the observed data may contain negative values due to noise even when the true underlying physical signal is strictly positive. Prior NMF work has not treated negative data in a statistically consistent manner, which becomes problematic for low signal-to-noise data with many negative values. In this paper we present two algorithms, Shift-NMF and Nearly-NMF, that can handle both the noisiness of the input data and also any introduced negativity. Both of these algorithms use the negative data space without clipping, and correctly recover non-negative signals without any introduced positive offset that occurs when clipping negative data. We demonstrate this numerically on both simple and more realistic examples, and prove that both algorithms have monotonically decreasing update rules.

Algorithms for Non-Negative Matrix Factorization on Noisy Data With Negative Values

TL;DR

Two algorithms are presented that use the negative data space without clipping or masking and recover non-negative signals without any introduced positive offset that occurs when clipping or masking negative data.

Abstract

Non-negative matrix factorization (NMF) is a dimensionality reduction technique that has shown promise for analyzing noisy data, especially astronomical data. For these datasets, the observed data may contain negative values due to noise even when the true underlying physical signal is strictly positive. Prior NMF work has not treated negative data in a statistically consistent manner, which becomes problematic for low signal-to-noise data with many negative values. In this paper we present two algorithms, Shift-NMF and Nearly-NMF, that can handle both the noisiness of the input data and also any introduced negativity. Both of these algorithms use the negative data space without clipping, and correctly recover non-negative signals without any introduced positive offset that occurs when clipping negative data. We demonstrate this numerically on both simple and more realistic examples, and prove that both algorithms have monotonically decreasing update rules.
Paper Structure (19 sections, 37 equations, 8 figures)

This paper contains 19 sections, 37 equations, 8 figures.

Figures (8)

  • Figure 1: Results of two weighted NMF templates generated on a toy example. See text for details of toy example. The top panel shows one representative exposure from the set of 500 as dots, with the noiseless truth and the template based reconstruction overplotted in dashed blue and solid red respectively. The bottom panel shows the two raw templates, one dotted and one solid, scaled so the maximum value is 1 but preserving the relative scale between the two templates. Notice that the templates, both the reconstruction in the upper panel and the templates themselves in the lower panel, have a positive vertical offset in the region where the truth is 0 due to the templates only fitting the positive component of the noisy data.
  • Figure 2: Euclidean distance during training of two templates on the toy problem set out in the Introduction for the first 50 iterations, for a variety of different NMF algorithms. Nearly-NMF is plotted in solid blue, and Shift-NMF is plotted in dotted orange. In dashed red Shift-NMF was trained with a value of $y$ that is twice the minimum shift required, and in dash-dot purple $y$ was set to five times the minimum shift. Each test was rerun 10 times with different starting points, to remove the possibility of starting point bias. It is evident that increasing the shift value beyond the minimum slows the convergence of the $\chi^2$ value, while still training to comparable minimums given enough iterations.
  • Figure 3: Results of two NMF templates generated on a toy example, for each of the three algorithms: regular weighted NMF, Shift-NMF, and Nearly-NMF. See Introduction for details of toy example. Top panel shows one representative exposure from the set of 500 in dots, with the noiseless truth and the template based reconstructions overplotted in varying styles. Note that the reconstructions from two methods presented in this paper, Shift-NMF and Nearly-NMF, are nearly indistinguishable from the truth, and lie directly on top of it. The next three panels shows the two raw templates, scaled so the maximum value is 1 but preserving the relative scale between the two templates. Shift-NMF and Nearly-NMF templates correctly go to zero on the edges, whereas weighted NMF has a vertical offset.
  • Figure 4: An example of the process used to generate the quasar dataset. In both panels three unnormalized spectra are offset vertically from each other by a constant value 0.04. In the upper panel, three different noisy spectra are plotted in light blue in the observed frame, as the simulated instrument would record them, with their noise-free base spectra overplotted in purple. The three spectra are annotated by their redshift value at the far right of the plot. Note in the noisy data the high prevalence of negative values due to the noise. In the lower panel we demonstrate the spectra as they are used in both Shift-NMF and Nearly-NMF, now in the rest frame. It is evident that the spectra cover different amounts of the wavelength grid, with none of them covering the entirety of the fitting space. The spectra plotted here are not renormalized, which is done before fitting.
  • Figure 5: The top panel shows the sum of the inverse variance weights in each pixel over the wavelength region covered by the templates. The next 5 panels show each of the 5 Nearly-NMF templates, generated on both the noisy and noise-free datasets. The noise-free templates are plotted as a solid line in red, while the noisy templates are dotted in light gray. Templates are plotted on their logarithmic grid, with a logarithmic scaling on the x-axis. The noisy templates have still recovered most of the same features present in the noise-free templates, and have good agreement even though the noisy data has a significant amount of negative values. The regions where the templates are noiseiest correspond with the same regions where the sum of the weights of the training data is low. Note that NMF based algorithms do not produce unique factorizations, and these are only one of many possible factorizations.
  • ...and 3 more figures