Table of Contents
Fetching ...

Fair PCA, One Component at a Time

Antonis Matakos, Martino Ciaperoni, Heikki Mannila

TL;DR

This paper introduces Fair-PC, a containment-preserving variant of Min-Max Fair PCA that incrementally constructs an orthonormal sequence of fair principal components by minimizing the worst-group reconstruction error for each rank-1 component. The authors derive a dual formulation showing each fair component corresponds to the leading eigenvector of a convex combination of group covariances, enabling scalable optimization via Frank-Wolfe and an SDP relaxation, with strong duality in the two-group case. They prove exact optimality and strong duality for $|oldsymbol{ ext{G}}|=2$, and demonstrate empirically that Fair-PC achieves balanced group reconstruction across ranks and outperforms previous FAIR-PCA approaches in both fairness metrics and runtime. The method retains the standard PCA containment property, enabling nested fair subspaces and practical use in feature selection, while remaining scalable to datasets with multiple groups. Limitations include the lack of formal guarantees for more than two groups and the assumption of fixed group membership.

Abstract

The Min-Max Fair PCA problem seeks a low-rank representation of multi-group data such that the the approximation error is as balanced as possible across groups. Existing approaches to this problem return a rank-$d$ fair subspace, but lack the fundamental containment property of standard PCA: each rank-$d$ PCA subspace should contain all lower-rank PCA subspaces. To fill this gap, we define fair principal components as directions that minimize the maximum group-wise reconstruction error, subject to orthogonality with previously selected components, and we introduce an iterative method to compute them. This approach preserves the containment property of standard PCA, and reduces to standard \pca for data with a single group. We analyze the theoretical properties of our method and show empirically that it outperforms existing approaches to Min-Max Fair PCA.

Fair PCA, One Component at a Time

TL;DR

This paper introduces Fair-PC, a containment-preserving variant of Min-Max Fair PCA that incrementally constructs an orthonormal sequence of fair principal components by minimizing the worst-group reconstruction error for each rank-1 component. The authors derive a dual formulation showing each fair component corresponds to the leading eigenvector of a convex combination of group covariances, enabling scalable optimization via Frank-Wolfe and an SDP relaxation, with strong duality in the two-group case. They prove exact optimality and strong duality for , and demonstrate empirically that Fair-PC achieves balanced group reconstruction across ranks and outperforms previous FAIR-PCA approaches in both fairness metrics and runtime. The method retains the standard PCA containment property, enabling nested fair subspaces and practical use in feature selection, while remaining scalable to datasets with multiple groups. Limitations include the lack of formal guarantees for more than two groups and the assumption of fixed group membership.

Abstract

The Min-Max Fair PCA problem seeks a low-rank representation of multi-group data such that the the approximation error is as balanced as possible across groups. Existing approaches to this problem return a rank- fair subspace, but lack the fundamental containment property of standard PCA: each rank- PCA subspace should contain all lower-rank PCA subspaces. To fill this gap, we define fair principal components as directions that minimize the maximum group-wise reconstruction error, subject to orthogonality with previously selected components, and we introduce an iterative method to compute them. This approach preserves the containment property of standard PCA, and reduces to standard \pca for data with a single group. We analyze the theoretical properties of our method and show empirically that it outperforms existing approaches to Min-Max Fair PCA.

Paper Structure

This paper contains 38 sections, 5 theorems, 42 equations, 6 figures, 2 tables, 3 algorithms.

Key Result

Theorem 5.1

Let $(\mathbf{v}\xspace^*, z^*)$ be an optimal solution to Problem prob:faireig_optimization. Then, there exist distinct groups $i \neq j$ such that:

Figures (6)

  • Figure 1: Left (a): synthetic data partitioned in two groups, as indicated by the color of the points. $\{\mathbf{w}\xspace_1,\mathbf{w}\xspace_2\}$ are the standard principal components while $\{\mathbf{v}\xspace_1,\mathbf{v}\xspace_2\}$ are the fair principal components given by our method. Right (b): real-world compas dataset partitioned in two groups, females and males. The $y$-axis indicates the ratio of the average group-wise reconstruction error incurred by standard principal components and the fair principal components. The $x$-axis indicates the number of components. We also report the average reconstruction error across both groups (males and females).
  • Figure 2: Results on the compas dataset. Top: two groups. Bottom: three groups. Columns show marginal loss, incremental loss, and $L_2$ reconstruction loss as a function of target rank $d$. Marker symbols indicate different groups.
  • Figure 3: Real-world and syntethic data. Primal and dual optimal objective values as a function of rank for the solution relying on the Frank-Wolfe algorithm. For synthetic data ( gaussian-3), the shaded region indicates one standard deviation from the mean across generated datasets.
  • Figure 4: Real-world datasets with two groups. Marginal, incremental, and reconstruction loss by rank. Different marker symbols indicate different groups.
  • Figure 5: communities-4 dataset with four groups. Marginal, incremental and reconstruction loss by rank. Different marker symbols indicate different groups.
  • ...and 1 more figures

Theorems & Definitions (12)

  • Theorem 5.1
  • Lemma 7.1
  • Theorem 7.2
  • Lemma 7.3
  • Definition C.1: Marginal loss
  • proof
  • proof
  • proof
  • proof
  • Lemma F.1
  • ...and 2 more