Deep Nonnegative Matrix Factorization with Beta Divergences

Valentin Leplat; Le Thi Khanh Hien; Akwum Onwunta; Nicolas Gillis

Deep Nonnegative Matrix Factorization with Beta Divergences

Valentin Leplat, Le Thi Khanh Hien, Akwum Onwunta, Nicolas Gillis

TL;DR

This work develops deep nonnegative matrix factorization methods based on β-divergences, with a focus on KL divergence, to extract multi-layered features across domains. It advocates a layer-centric loss for identifiability and proposes two models: (i) a nonregularized deep β-NMF and (ii) a minimum-volume deep KL-NMF with log-determinant regularization. The authors derive multiplicative-update–based algorithms within a block majorization-minimization framework, prove convergence under perturbations, and demonstrate improved layer balance, sparser features, and meaningful hierarchical structures in facial features, topic modeling, and hyperspectral unmixing. The approach shows practical advantages in interpretability and endmember localization, with code available for reproducibility. Collectively, the paper advances regularized deep NMF with β-divergences and provides scalable optimization tools for real-world data.

Abstract

Deep Nonnegative Matrix Factorization (deep NMF) has recently emerged as a valuable technique for extracting multiple layers of features across different scales. However, all existing deep NMF models and algorithms have primarily centered their evaluation on the least squares error, which may not be the most appropriate metric for assessing the quality of approximations on diverse datasets. For instance, when dealing with data types such as audio signals and documents, it is widely acknowledged that $β$-divergences offer a more suitable alternative. In this paper, we develop new models and algorithms for deep NMF using some $β$-divergences, with a focus on the Kullback-Leibler divergence. Subsequently, we apply these techniques to the extraction of facial features, the identification of topics within document collections, and the identification of materials within hyperspectral images.

Deep Nonnegative Matrix Factorization with Beta Divergences

TL;DR

Abstract

-divergences offer a more suitable alternative. In this paper, we develop new models and algorithms for deep NMF using some

-divergences, with a focus on the Kullback-Leibler divergence. Subsequently, we apply these techniques to the extraction of facial features, the identification of topics within document collections, and the identification of materials within hyperspectral images.

Paper Structure (41 sections, 3 theorems, 46 equations, 13 figures, 5 tables, 2 algorithms)

This paper contains 41 sections, 3 theorems, 46 equations, 13 figures, 5 tables, 2 algorithms.

Introduction
Contribution and outline of the paper
What Deep NMF model to use?
Layer centric vs. data centric
Identifiability of NMF
The sufficiently scattered condition
Minimum-volume NMF
Discussion on identifiability of regularized deep NMF
Deep $\beta$-NMF: models and algorithms
The two proposed deep NMF models
Algorithms for solving the proposed deep $\beta$-NMF models
Algorithm for solving deep $\beta$-NMF without regularization
Update of $H_l$
Update of $W_l$ for $l=1,\ldots,L-1$.
Update of $W_L$
...and 26 more sections

Key Result

Theorem 1

huang2013non Let $X = W^* H^*$ be a rank-$r$ NMF of $X$, where ${W^*}^\top$ and $H^*$ satisfy the SSC. Then any other rank-$r$ NMF of $X$, $X = WH$, corresponds to $(W^*,H^*)$, up to permutation and scaling of the columns of $W^*$ and rows of $H^*$.

Figures (13)

Figure 1: Deep NMF applied on the Urban hyperspectral image, which is an aerial image of a Walmart in Copperas Cove, Texas. We can for example easily identify the roof top and the parking lot of the store; see the fourth and fifth image in (a), respectively. Using Deep NMF with two layers, we obtain the following: (a) Layer 1 with $r_1 = 6$ contains the abundance maps $H_1$ corresponding to the spectral signatures in $W_1$, and (b) Layer 2 with $r_2 = 2$ contains the abundance maps $H_2 H_1$ corresponding to the spectral signatures in $W_2$. As the factorization unfolds, deep NMF generates denser abundance maps which are combinations of abundance maps from previous layers. Here, the first level extracts 6 materials (including grass, roof tops and dirt, trees, other roof tops, road and dirt), which are merged into vegetation vs. non-vegetation at the second layer.
Figure 2: Evolution of the median errors at the different levels of deep $\beta$-NMF with $\beta=\frac{3}{2}$ (initialized with multilayer $\beta$-NMF after 500 iterations) divided by the error of multilayer $\beta$-NMF after 1000 iterations.
Figure 3: Example of facial features extracted by multilayer $\beta$-NMF vs. deep $\beta$-NMF for $\beta=\frac{3}{2}$.
Figure 4: Evolution of the error at the different levels of deep KL-NMF divided by the error of multilayer KL-NMF.
Figure 5: HSI datasets: Samson (left) - Moffett Field acquired by AVIRIS in 1997 and the region of interest (right) represented in synthetic colors, figure reproduced from 5256272.
...and 8 more figures

Theorems & Definitions (4)

Theorem 1
Theorem 2: fu2015blindfu2018identifiabilityleplat2020blind
Example 1: The product of matrices satisfying the SSC typically does not satisfy the SSC
Lemma 1: Majorizer function for $\beta$-NMF fevotte2011algorithms

Deep Nonnegative Matrix Factorization with Beta Divergences

TL;DR

Abstract

Deep Nonnegative Matrix Factorization with Beta Divergences

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (4)