Table of Contents
Fetching ...

Sparse NMF with Archetypal Regularization: Computational and Robustness Properties

Kayhan Behdin, Rahul Mazumder

TL;DR

This work tackles sparse nonnegative matrix factorization with archetypal regularization (Sparse AA, or SAA) to obtain interpretable, robust data representations even under model misspecification. It introduces strong and weak robustness notions, defines a distance-based archetype similarity measure, and proves that strong robustness implies weak robustness, with bounds that hold under minimal assumptions. The authors develop a proximal block coordinate descent algorithm, enhanced by a mixed-integer programming (MIP) initialization and local search to promote sparsity and improve solution quality, along with a penalized AA formulation that preserves robustness. Comprehensive synthetic and real-data experiments (faces, cancer gene expression, hyperspectral images, and scene categorization) demonstrate that SAA achieves superior robustness and interpretability compared to existing sparse NMF methods, while maintaining competitive clustering and reconstruction performance. The framework combines theoretical robustness guarantees with practical optimization strategies, enabling robust, sparse archetypal representations in high-dimensional settings.

Abstract

We consider the problem of sparse nonnegative matrix factorization (NMF) using archetypal regularization. The goal is to represent a collection of data points as nonnegative linear combinations of a few nonnegative sparse factors with appealing geometric properties, arising from the use of archetypal regularization. We generalize the notion of robustness studied in Javadi and Montanari (2019) (without sparsity) to the notions of (a) strong robustness that implies each estimated archetype is close to the underlying archetypes and (b) weak robustness that implies there exists at least one recovered archetype that is close to the underlying archetypes. Our theoretical results on robustness guarantees hold under minimal assumptions on the underlying data, and applies to settings where the underlying archetypes need not be sparse. We present theoretical results and illustrative examples to strengthen the insights underlying the notions of robustness. We propose new algorithms for our optimization problem; and present numerical experiments on synthetic and real data sets that shed further insights into our proposed framework and theoretical developments.

Sparse NMF with Archetypal Regularization: Computational and Robustness Properties

TL;DR

This work tackles sparse nonnegative matrix factorization with archetypal regularization (Sparse AA, or SAA) to obtain interpretable, robust data representations even under model misspecification. It introduces strong and weak robustness notions, defines a distance-based archetype similarity measure, and proves that strong robustness implies weak robustness, with bounds that hold under minimal assumptions. The authors develop a proximal block coordinate descent algorithm, enhanced by a mixed-integer programming (MIP) initialization and local search to promote sparsity and improve solution quality, along with a penalized AA formulation that preserves robustness. Comprehensive synthetic and real-data experiments (faces, cancer gene expression, hyperspectral images, and scene categorization) demonstrate that SAA achieves superior robustness and interpretability compared to existing sparse NMF methods, while maintaining competitive clustering and reconstruction performance. The framework combines theoretical robustness guarantees with practical optimization strategies, enabling robust, sparse archetypal representations in high-dimensional settings.

Abstract

We consider the problem of sparse nonnegative matrix factorization (NMF) using archetypal regularization. The goal is to represent a collection of data points as nonnegative linear combinations of a few nonnegative sparse factors with appealing geometric properties, arising from the use of archetypal regularization. We generalize the notion of robustness studied in Javadi and Montanari (2019) (without sparsity) to the notions of (a) strong robustness that implies each estimated archetype is close to the underlying archetypes and (b) weak robustness that implies there exists at least one recovered archetype that is close to the underlying archetypes. Our theoretical results on robustness guarantees hold under minimal assumptions on the underlying data, and applies to settings where the underlying archetypes need not be sparse. We present theoretical results and illustrative examples to strengthen the insights underlying the notions of robustness. We propose new algorithms for our optimization problem; and present numerical experiments on synthetic and real data sets that shed further insights into our proposed framework and theoretical developments.

Paper Structure

This paper contains 47 sections, 20 theorems, 160 equations, 16 figures, 6 tables, 3 algorithms.

Key Result

Theorem 1

Let us define the quantity Then for any $\boldsymbol{H}\in\mathbb{R}_{\geq 0}^{k\times n}$ we have

Figures (16)

  • Figure 1: In these figures, blue crosses ('+') represent the data points in $\mathbb{R}^2$. We seek to find 3 archetypes such that their convex hull contains the data. Panel (a): the black convex hull (triangle) shows an arbitrary solution to NMF, while the red convex hull shows the exact AA solution that is the smallest triangle containing the data. Panel (b): the black convex hull shows a solution that describes the data with no error, while it is not close to the convex hull of the data. The red convex hull shows the regularized AA solution, which both describes the data well (but with nonzero error) and is close to the data. Panel (c): the red convex hull shows the exact AA solution which does not have any zero coordinate, while the black convex hull shows a solution which is sparse and only has 2 nonzero coordinates. In addition, no other solution with the same sparsity can be found which is closer to the data.
  • Figure 2: Solutions for Example \ref{['robustness-newexample']}. Panel (a): The true (underlying) archetypes and the noisy data Panel (b): Noisy data and true archetypes, in addition to the AA solution which is strongly and weakly robust and is close to the true model. Panels (c,d,e): Noisy data and true archetypes, in addition to some candidate archetypes that contain the noisy data perfectly (i.e. are feasible for problem \ref{['archl0noise']} with $\ell=kn$). Particularly, the candidate archetypes in Panels (c) and (d) have equal weak robustness error, while the one in Panel (c) has a smaller strong robustness error. This shows that strong robustness can distinguish between sets of candidate solutions in Panels (c) and (d), unlike weak robustness.
  • Figure 3: Illustration of Example 1: (a) The noiseless data and two examples of noisy data. (b) Details of the example for a specific value of $\theta$.
  • Figure 4: In this figure, blue crosses represent the data points in $\mathbb{R}^2$ and blue circles and line represent $\boldsymbol{H}_0$ and its convex hull containing the data. Black circles and lines represent two candidate set of archetypes and their convex hulls. Note that the set that is closer to $\boldsymbol{H}_0$ describes the data better as anticipated by Theorem \ref{['noiseremove']}.
  • Figure 5: Effect of varying noise on the performance of different algorithms for the well-specified case in Section \ref{['synthetic']}.
  • ...and 11 more figures

Theorems & Definitions (34)

  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Definition 1
  • Example 1
  • Example 2
  • Theorem 1: Strong robustness implies weak robustness
  • Remark 5
  • Theorem 2
  • ...and 24 more