Table of Contents
Fetching ...

Towards a Fairer Non-negative Matrix Factorization

Lara Kassab, Erin George, Deanna Needell, Haowen Geng, Nika Jafar Nia, Aoxi Li

TL;DR

It is demonstrated that a modification of the objective function, by using a min-max formulation, may sometimes be able to offer an improvement in fairness for groups in the population.

Abstract

There has been a recent critical need to study fairness and bias in machine learning (ML) algorithms. Since there is clearly no one-size-fits-all solution to fairness, ML methods should be developed alongside bias mitigation strategies that are practical and approachable to the practitioner. Motivated by recent work on ``fair" PCA, here we consider the more challenging method of non-negative matrix factorization (NMF) as both a showcasing example and a method that is important in its own right for both topic modeling tasks and feature extraction for other ML tasks. We demonstrate that a modification of the objective function, by using a min-max formulation, may \textit{sometimes} be able to offer an improvement in fairness for groups in the population. We derive two methods for the objective minimization, a multiplicative update rule as well as an alternating minimization scheme, and discuss implementation practicalities. We include a suite of synthetic and real experiments that show how the method may improve fairness while also highlighting the important fact that this may sometime increase error for some individuals and fairness is not a rigid definition and method choice should strongly depend on the application at hand.

Towards a Fairer Non-negative Matrix Factorization

TL;DR

It is demonstrated that a modification of the objective function, by using a min-max formulation, may sometimes be able to offer an improvement in fairness for groups in the population.

Abstract

There has been a recent critical need to study fairness and bias in machine learning (ML) algorithms. Since there is clearly no one-size-fits-all solution to fairness, ML methods should be developed alongside bias mitigation strategies that are practical and approachable to the practitioner. Motivated by recent work on ``fair" PCA, here we consider the more challenging method of non-negative matrix factorization (NMF) as both a showcasing example and a method that is important in its own right for both topic modeling tasks and feature extraction for other ML tasks. We demonstrate that a modification of the objective function, by using a min-max formulation, may \textit{sometimes} be able to offer an improvement in fairness for groups in the population. We derive two methods for the objective minimization, a multiplicative update rule as well as an alternating minimization scheme, and discuss implementation practicalities. We include a suite of synthetic and real experiments that show how the method may improve fairness while also highlighting the important fact that this may sometime increase error for some individuals and fairness is not a rigid definition and method choice should strongly depend on the application at hand.

Paper Structure

This paper contains 27 sections, 28 equations, 20 figures, 3 algorithms.

Figures (20)

  • Figure 1: Illustration of NMF applied to a data matrix $\bm{X}$ that consists of two submatrices $\bm{X}_A$ and $\bm{X}_B$. The matrices $\bm{W}_A$ and $\bm{W}_B$ are the representation matrices corresponding to $\bm{X}_A$ and $\bm{X}_B$, respectively and $\bm{H}$ the common dictionary matrix.
  • Figure 2: The Relative Error (%) of each group is reported using standard NMF applied on the full synthetic data matrix (left) and each group data matrix individually (right). This synthetic data matrix is composed of two groups in orthogonal subspaces: a high rank ($r=6$) group and a low rank ($r=3$) group. The mean and standard deviation over 10 trials is reported.
  • Figure 3: The Relative Error (%) of each group is reported using standard NMF applied on the full synthetic data matrix (left) and each group data matrix individually (right). This synthetic data matrix consists of three groups. Groups 1 and 3 lie in orthogonal subspaces and are each rank 2. Group 2 is constructed from the same non-negative basis as group 1, but with random noise added. The mean and standard deviation over 10 trials is reported.
  • Figure 4: Convergence of the multiplicative update rule for Fairer-NMF on a synthetic data matrix with two groups. The multiplicative update rule is compared when using the decreasing step size for $\bm{c}$ (left) and when exactly optimizing $\bm{c}$ (right). The relative reconstruction loss of each group is reported per iteration wit the mean and standard deviation taken over $100$ trials.
  • Figure 5: Convergence of the multiplicative update rule for Fairer-NMF on a synthetic data matrix with three groups. The multiplicative update rule is compared when using the decreasing step size for $\bm{c}$ (left) and when exactly optimizing $\bm{c}$ (right). The relative reconstruction loss of each group is reported per iteration wit the mean and standard deviation taken over $100$ trials.
  • ...and 15 more figures

Theorems & Definitions (4)

  • Definition 4.1: Relative Reconstruction Error
  • Definition 5.1: Relative Reconstruction Loss
  • Remark 5.2
  • Remark 5.3