Table of Contents
Fetching ...

Rank Suggestion in Non-negative Matrix Factorization: Residual Sensitivity to Initial Conditions (RSIC)

Marc A. Tunnell, Zachary J. DeBruine, Erin Carrier

TL;DR

A novel approach called Residual Sensitivity to Intial Conditions (RSIC) that suggests potentially multiple ranks of interest by analyzing the sensitivity of the relative residuals to different initializations, providing a more scalable and generalizable solution for rank determination in NMF that does not rely on domain-specific knowledge or assumptions.

Abstract

Determining the appropriate rank in Non-negative Matrix Factorization (NMF) is a critical challenge that often requires extensive parameter tuning and domain-specific knowledge. Traditional methods for rank determination focus on identifying a single optimal rank, which may not capture the complex structure inherent in real-world datasets. In this study, we introduce a novel approach called Residual Sensitivity to Intial Conditions (RSIC) that suggests potentially multiple ranks of interest by analyzing the sensitivity of the relative residuals (e.g. relative reconstruction error) to different initializations. By computing the Mean Coordinatewise Interquartile Range (MCI) of the residuals across multiple random initializations, our method identifies regions where the NMF solutions are less sensitive to initial conditions and potentially more meaningful. We evaluate RSIC on a diverse set of datasets, including single-cell gene expression data, image data, and text data, and compare it against current state-of-the-art existing rank determination methods. Our experiments demonstrate that RSIC effectively identifies relevant ranks consistent with the underlying structure of the data, outperforming traditional methods in scenarios where they are computationally infeasible or less accurate. This approach provides a more scalable and generalizable solution for rank determination in NMF that does not rely on domain-specific knowledge or assumptions.

Rank Suggestion in Non-negative Matrix Factorization: Residual Sensitivity to Initial Conditions (RSIC)

TL;DR

A novel approach called Residual Sensitivity to Intial Conditions (RSIC) that suggests potentially multiple ranks of interest by analyzing the sensitivity of the relative residuals to different initializations, providing a more scalable and generalizable solution for rank determination in NMF that does not rely on domain-specific knowledge or assumptions.

Abstract

Determining the appropriate rank in Non-negative Matrix Factorization (NMF) is a critical challenge that often requires extensive parameter tuning and domain-specific knowledge. Traditional methods for rank determination focus on identifying a single optimal rank, which may not capture the complex structure inherent in real-world datasets. In this study, we introduce a novel approach called Residual Sensitivity to Intial Conditions (RSIC) that suggests potentially multiple ranks of interest by analyzing the sensitivity of the relative residuals (e.g. relative reconstruction error) to different initializations. By computing the Mean Coordinatewise Interquartile Range (MCI) of the residuals across multiple random initializations, our method identifies regions where the NMF solutions are less sensitive to initial conditions and potentially more meaningful. We evaluate RSIC on a diverse set of datasets, including single-cell gene expression data, image data, and text data, and compare it against current state-of-the-art existing rank determination methods. Our experiments demonstrate that RSIC effectively identifies relevant ranks consistent with the underlying structure of the data, outperforming traditional methods in scenarios where they are computationally infeasible or less accurate. This approach provides a more scalable and generalizable solution for rank determination in NMF that does not rely on domain-specific knowledge or assumptions.

Paper Structure

This paper contains 33 sections, 12 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The delta between the smallest and largest error at each point in the rank-10 reconstruction of the first face in the Faces dataset.
  • Figure 2: Mean Coordinatewise IQR (MCI) ($y$-axis) vs rank ($x$-axis) for the ALL-AML dataset for ranks 1 through 38. We identify ranks 5, 10, 19, and 23 as "islands of stability" and thereby potential ranks of interest.
  • Figure 3: Mean Coordinatewise IQR (MCI) ($y$-axis) vs rank ($x$-axis) for the Full Digits dataset for ranks 1 through 64. We identify ranks 3, 11, and 21 as "islands of stability" and thereby potential ranks of interest.
  • Figure 4: Mean Coordinatewise IQR (MCI) ($y$-axis) vs rank ($x$-axis) for the Dig0246 dataset for ranks 1 through 64. We identify ranks 4 and 6 as "islands of stability" and thereby potential ranks of interest.
  • Figure 5: Mean Coordinatewise IQR (MCI) ($y$-axis) vs rank ($x$-axis) for the Swimmer dataset for ranks 1 through 64. We identify rank 16 as the only "island of stability" and thereby potential rank of interest.
  • ...and 1 more figures