Sparse NMF with Archetypal Regularization: Computational and Robustness Properties
Kayhan Behdin, Rahul Mazumder
TL;DR
This work tackles sparse nonnegative matrix factorization with archetypal regularization (Sparse AA, or SAA) to obtain interpretable, robust data representations even under model misspecification. It introduces strong and weak robustness notions, defines a distance-based archetype similarity measure, and proves that strong robustness implies weak robustness, with bounds that hold under minimal assumptions. The authors develop a proximal block coordinate descent algorithm, enhanced by a mixed-integer programming (MIP) initialization and local search to promote sparsity and improve solution quality, along with a penalized AA formulation that preserves robustness. Comprehensive synthetic and real-data experiments (faces, cancer gene expression, hyperspectral images, and scene categorization) demonstrate that SAA achieves superior robustness and interpretability compared to existing sparse NMF methods, while maintaining competitive clustering and reconstruction performance. The framework combines theoretical robustness guarantees with practical optimization strategies, enabling robust, sparse archetypal representations in high-dimensional settings.
Abstract
We consider the problem of sparse nonnegative matrix factorization (NMF) using archetypal regularization. The goal is to represent a collection of data points as nonnegative linear combinations of a few nonnegative sparse factors with appealing geometric properties, arising from the use of archetypal regularization. We generalize the notion of robustness studied in Javadi and Montanari (2019) (without sparsity) to the notions of (a) strong robustness that implies each estimated archetype is close to the underlying archetypes and (b) weak robustness that implies there exists at least one recovered archetype that is close to the underlying archetypes. Our theoretical results on robustness guarantees hold under minimal assumptions on the underlying data, and applies to settings where the underlying archetypes need not be sparse. We present theoretical results and illustrative examples to strengthen the insights underlying the notions of robustness. We propose new algorithms for our optimization problem; and present numerical experiments on synthetic and real data sets that shed further insights into our proposed framework and theoretical developments.
