Table of Contents
Fetching ...

Robustness of Minimum-Volume Nonnegative Matrix Factorization under an Expanded Sufficiently Scattered Condition

Giovanni Barbarino, Nicolas Gillis, Subhayan Saha

TL;DR

This work proves that minimum-volume NMF can robustly recover ground-truth factors under noisy data when the ground-truth abundances satisfy an expanded sufficiently scattered condition ($p$-SSC). By linking the estimated and ground-truth factors through a connecting matrix $R$ and carefully bounding perturbations, the authors derive explicit stability results for general $p$-SSC and for the near-separable case ($p$ near 1), showing that recovery remains feasible as long as the noise level is controlled relative to the SSC strength and the conditioning of the ground-truth basis. The analysis leverages dual cone geometry, the $H_p$ construction, and a permutation-approximation argument to establish that the min-vol solution identifies $W^{\#}$ (up to permutation) and enables recovery of $H^{\#}$. These results clarify how data spread within the latent simplex and the expanded SSC contribute to identifiability in the presence of noise, with practical implications for hyperspectral unmixing, topic modeling, and related NMF applications.

Abstract

Minimum-volume nonnegative matrix factorization (min-vol NMF) has been used successfully in many applications, such as hyperspectral imaging, chemical kinetics, spectroscopy, topic modeling, and audio source separation. However, its robustness to noise has been a long-standing open problem. In this paper, we prove that min-vol NMF identifies the groundtruth factors in the presence of noise under a condition referred to as the expanded sufficiently scattered condition which requires the data points to be sufficiently well scattered in the latent simplex generated by the basis vectors.

Robustness of Minimum-Volume Nonnegative Matrix Factorization under an Expanded Sufficiently Scattered Condition

TL;DR

This work proves that minimum-volume NMF can robustly recover ground-truth factors under noisy data when the ground-truth abundances satisfy an expanded sufficiently scattered condition (-SSC). By linking the estimated and ground-truth factors through a connecting matrix and carefully bounding perturbations, the authors derive explicit stability results for general -SSC and for the near-separable case ( near 1), showing that recovery remains feasible as long as the noise level is controlled relative to the SSC strength and the conditioning of the ground-truth basis. The analysis leverages dual cone geometry, the construction, and a permutation-approximation argument to establish that the min-vol solution identifies (up to permutation) and enables recovery of . These results clarify how data spread within the latent simplex and the expanded SSC contribute to identifiability in the presence of noise, with practical implications for hyperspectral unmixing, topic modeling, and related NMF applications.

Abstract

Minimum-volume nonnegative matrix factorization (min-vol NMF) has been used successfully in many applications, such as hyperspectral imaging, chemical kinetics, spectroscopy, topic modeling, and audio source separation. However, its robustness to noise has been a long-standing open problem. In this paper, we prove that min-vol NMF identifies the groundtruth factors in the presence of noise under a condition referred to as the expanded sufficiently scattered condition which requires the data points to be sufficiently well scattered in the latent simplex generated by the basis vectors.

Paper Structure

This paper contains 56 sections, 33 theorems, 231 equations, 4 figures.

Key Result

Theorem 1

Under Assumption ass:perturbed_pSSC, there exist absolute positive constants $C_\varepsilon,C_e>0$ such that if the level of perturbation $\varepsilon$ satisfies then where $\|W^\#\|$ is the matrix $\ell_2$-norm of $W^\#$, and $\mathcal{P}_r$ is the set of $r\times r$ permutation matrices.

Figures (4)

  • Figure 1: Geometric intuition for SSC on the left, $p$-SSC on the center with $1< p< \sqrt{r-1}$, and separability on the right. Visualization on the unit simplex $\Delta^r$ in the case $r=3$ and for $H$ row stochastic.
  • Figure 2: On the left and center, $\mathcal{C}_p^*,\mathcal{S}_p^*$ and $\mathcal{C}_p, \mathcal{S}_p$ on $\mathcal{E}$ in dimension $r=3$ for $1<p<\sqrt{r-1}$. On the left, the containments between $\mathop{\mathrm{conv}}\nolimits(H^\top)$, $\mathop{\mathrm{conv}}\nolimits^*(H^\top)$, $\mathcal{C}_p\cap\mathcal{E}$ and $\mathcal{C}_p^*\cap \mathcal{E}$ for a row stochastic and $p$-SSC matrix $H$. On the right, the points $v_i$, their convex hull $\mathop{\mathrm{conv}}\nolimits(H_p)$ and $\mathop{\mathrm{conv}}\nolimits(H^\top)$ for a row stochastic and $p$-SSC $H$.
  • Figure 3: Visualization of the sets $\mathcal{P}$, $\mathcal{B}$ and $\widetilde{\mathcal{S}}$; the pink set is $\mathcal{P}\setminus\mathcal{B}$. On the left, the favourable case where the rows of $R$ belong to disjoint regions around the vectors $\tilde{e}_j$'s which are close to the unit vectors $e_j$'s. On the center, the effect of increasing the level of perturbation $\varepsilon$. On the right, the effect of increasing the value of $p$ to almost $\sqrt{r-1}$. Increasing $\varepsilon$ or $p$ too much makes $R$ potentially far from a permutation matrix, since the rows of $R$ can be anywhere in the pink region.
  • Figure 4: Visualization of the spaces introduced in Notation \ref{['not:geometric_notations2']}: $\mathcal{T}$, $\widetilde{\mathcal{N}_i}$, ${\mathcal{N}_i}$, ${\mathcal{R}_i}$, ${\mathcal{T}_i}$, $\hat{\mathcal{T}}$.

Theorems & Definitions (54)

  • Remark 1: Nonnegativity of $X$ and $W$
  • Remark 2: Relaxation of the sum-to-one constraint
  • Theorem 1
  • Theorem 2
  • Definition 1
  • Lemma 1
  • Definition 2: Dual Cone
  • Lemma 2
  • Corollary 1
  • Definition 3
  • ...and 44 more