Maximum-Volume Nonnegative Matrix Factorization

Olivier Vu Thanh; Nicolas Gillis

Maximum-Volume Nonnegative Matrix Factorization

Olivier Vu Thanh, Nicolas Gillis

TL;DR

This work introduces Maximum-Volume NMF (MaxVol NMF) as the dual of the traditional MinVol NMF, showing that maximizing the volume of $H$ yields identifiability like MinVol in the noiseless setting but often superior sparsity and robustness to noise. It presents two optimization frameworks—adaptive accelerated gradient descent and ADMM—for solving MaxVol NMF, along with a normalized variant (N-MaxVol NMF) that mitigates the clustering bias inherent to MaxVol at large $\lambda$. The methods are demonstrated on synthetic data and hyperspectral images, where MaxVol NMF and especially N-MaxVol NMF provide improved endmember separation and sparse abundances, albeit with identifiability caveats for the normalized variant. Overall, the approach offers a principled, dual perspective to volume-regularized NMF with practical gains for HU and related applications.

Abstract

Nonnegative matrix factorization (NMF) is a popular data embedding technique. Given a nonnegative data matrix $X$, it aims at finding two lower dimensional matrices, $W$ and $H$, such that $X\approx WH$, where the factors $W$ and $H$ are constrained to be element-wise nonnegative. The factor $W$ serves as a basis for the columns of $X$. In order to obtain more interpretable and unique solutions, minimum-volume NMF (MinVol NMF) minimizes the volume of $W$. In this paper, we consider the dual approach, where the volume of $H$ is maximized instead; this is referred to as maximum-volume NMF (MaxVol NMF). MaxVol NMF is identifiable under the same conditions as MinVol NMF in the noiseless case, but it behaves rather differently in the presence of noise. In practice, MaxVol NMF is much more effective to extract a sparse decomposition and does not generate rank-deficient solutions. In fact, we prove that the solutions of MaxVol NMF with the largest volume correspond to clustering the columns of $X$ in disjoint clusters, while the solutions of MinVol NMF with smallest volume are rank deficient. We propose two algorithms to solve MaxVol NMF. We also present a normalized variant of MaxVol NMF that exhibits better performance than MinVol NMF and MaxVol NMF, and can be interpreted as a continuum between standard NMF and orthogonal NMF. We illustrate our results in the context of hyperspectral unmixing.

Maximum-Volume Nonnegative Matrix Factorization

TL;DR

This work introduces Maximum-Volume NMF (MaxVol NMF) as the dual of the traditional MinVol NMF, showing that maximizing the volume of

yields identifiability like MinVol in the noiseless setting but often superior sparsity and robustness to noise. It presents two optimization frameworks—adaptive accelerated gradient descent and ADMM—for solving MaxVol NMF, along with a normalized variant (N-MaxVol NMF) that mitigates the clustering bias inherent to MaxVol at large

. The methods are demonstrated on synthetic data and hyperspectral images, where MaxVol NMF and especially N-MaxVol NMF provide improved endmember separation and sparse abundances, albeit with identifiability caveats for the normalized variant. Overall, the approach offers a principled, dual perspective to volume-regularized NMF with practical gains for HU and related applications.

Abstract

Nonnegative matrix factorization (NMF) is a popular data embedding technique. Given a nonnegative data matrix

, it aims at finding two lower dimensional matrices,

and

, such that

, where the factors

and

are constrained to be element-wise nonnegative. The factor

serves as a basis for the columns of

. In order to obtain more interpretable and unique solutions, minimum-volume NMF (MinVol NMF) minimizes the volume of

. In this paper, we consider the dual approach, where the volume of

is maximized instead; this is referred to as maximum-volume NMF (MaxVol NMF). MaxVol NMF is identifiable under the same conditions as MinVol NMF in the noiseless case, but it behaves rather differently in the presence of noise. In practice, MaxVol NMF is much more effective to extract a sparse decomposition and does not generate rank-deficient solutions. In fact, we prove that the solutions of MaxVol NMF with the largest volume correspond to clustering the columns of

in disjoint clusters, while the solutions of MinVol NMF with smallest volume are rank deficient. We propose two algorithms to solve MaxVol NMF. We also present a normalized variant of MaxVol NMF that exhibits better performance than MinVol NMF and MaxVol NMF, and can be interpreted as a continuum between standard NMF and orthogonal NMF. We illustrate our results in the context of hyperspectral unmixing.

Paper Structure (24 sections, 2 theorems, 47 equations, 23 figures, 2 tables, 3 algorithms)

This paper contains 24 sections, 2 theorems, 47 equations, 23 figures, 2 tables, 3 algorithms.

Introduction
Motivation: MinVol vs. MaxVol NMF
Two weaknesses on MinVol NMF
MaxVol NMF
Identifiability of MaxVol NMF
Behavior of MaxVol NMF
Solving MaxVol NMF
Adaptive accelerated gradient descent
Alternating direction method of multipliers (ADMM)
Updating $W$
Updating $H$
Updating $Y$
Comparison of the two algorithms
Normalized MaxVol NMF (N-MaxVol NMF)
Solving N-MaxVol NMF
...and 9 more sections

Key Result

Theorem 1

tatli2021polytopic Let $X=WH$ be a MaxVol NMF of $X$ of size $r = \mathop{\mathrm{rank}}\nolimits(X)$, in the sense of eq:exactmaxvol. If $H$ satisfies SSC as in app:def:ssc, then MaxVol NMF $(W,H)$ of $X$ is essentially unique.

Figures (23)

Figure 1: Abundance maps and normalized endmembers (from the left to the right: water, soil and tree) for MinVol on the Samson dataset with $\delta=1$.
Figure 2: Abundance maps and normalized endmembers (from the left to the right: water, soil and tree, except for $\lambda=50$) for MinVol on the Moffett dataset with $\delta=0.1$.
Figure 3: Abundance maps of MaxVol NMF on Samson, depending on $\lambda$.
Figure 4: ADMM on synthetic dataset with $\epsilon=10^{-3}$
Figure 5: Comparison of algorithms for MaxVol NMF on various datasets
...and 18 more figures

Theorems & Definitions (10)

Definition 1: Sufficiently scattered condition - SSC
Theorem 1
Proof 1
Remark 1
Remark 2
Definition 2: Bregman distance
Definition 3: Relative smoothness bauschke2017descent
Corollary 1
Proof 2
Definition 4

Maximum-Volume Nonnegative Matrix Factorization

TL;DR

Abstract

Maximum-Volume Nonnegative Matrix Factorization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (23)

Theorems & Definitions (10)