Table of Contents
Fetching ...

When Collaborative Filtering is not Collaborative: Unfairness of PCA for Recommendations

David Liu, Jackie Baek, Tina Eliassi-Rad

TL;DR

The paper investigates fairness in principal component analysis (PCA) when applied to collaborative filtering for recommendations, revealing two item-level unfairness mechanisms: (i) less popular items rely on trailing PCA components, reducing their recovery, and (ii) leading components can become overly specialized on individual popular items, limiting true collaboration. To address these issues, the authors propose Item-Weighted PCA, a convex semi-definite program that up-weights less popular items via weights $w_j = p_j^{\gamma}$ and enforces a hard rank constraint, with Vanilla PCA and Normalized PCA shown as special cases for certain matrix structures. The approach yields theoretical guarantees in stylized matrix classes and demonstrates through experiments on LastFM and MovieLens that Item-Weighted PCA mitigates the two unfairness mechanisms while achieving competitive or superior downstream recommendation performance compared to PCA baselines. Overall, Item-Weighted PCA offers a principled, convex mechanism to balance popularity effects in latent representations, improving both fairness in item-level learning and user-centric recommendations. The method achieves a practical balance between representation fairness and predictive accuracy, and its interpolation between baselines provides flexibility across datasets and popularity regimes.

Abstract

We study the fairness of dimensionality reduction methods for recommendations. We focus on the fundamental method of principal component analysis (PCA), which identifies latent components and produces a low-rank approximation via the leading components while discarding the trailing components. Prior works have defined notions of "fair PCA"; however, these definitions do not answer the following question: why is PCA unfair? We identify two underlying popularity mechanisms that induce item unfairness in PCA. The first negatively impacts less popular items because less popular items rely on trailing latent components to recover their values. The second negatively impacts highly popular items, since the leading PCA components specialize in individual popular items instead of capturing similarities between items. To address these issues, we develop a polynomial-time algorithm, Item-Weighted PCA, that flexibly up-weights less popular items when optimizing for leading principal components. We theoretically show that PCA, in all cases, and Normalized PCA, in cases of block-diagonal matrices, are instances of Item-Weighted PCA. We empirically show that there exist datasets for which Item-Weighted PCA yields the optimal solution while the baselines do not. In contrast to past dimensionality reduction re-weighting techniques, Item-Weighted PCA solves a convex optimization problem and enforces a hard rank constraint. Our evaluations on real-world datasets show that Item-Weighted PCA not only mitigates both unfairness mechanisms, but also produces recommendations that outperform those of PCA baselines.

When Collaborative Filtering is not Collaborative: Unfairness of PCA for Recommendations

TL;DR

The paper investigates fairness in principal component analysis (PCA) when applied to collaborative filtering for recommendations, revealing two item-level unfairness mechanisms: (i) less popular items rely on trailing PCA components, reducing their recovery, and (ii) leading components can become overly specialized on individual popular items, limiting true collaboration. To address these issues, the authors propose Item-Weighted PCA, a convex semi-definite program that up-weights less popular items via weights and enforces a hard rank constraint, with Vanilla PCA and Normalized PCA shown as special cases for certain matrix structures. The approach yields theoretical guarantees in stylized matrix classes and demonstrates through experiments on LastFM and MovieLens that Item-Weighted PCA mitigates the two unfairness mechanisms while achieving competitive or superior downstream recommendation performance compared to PCA baselines. Overall, Item-Weighted PCA offers a principled, convex mechanism to balance popularity effects in latent representations, improving both fairness in item-level learning and user-centric recommendations. The method achieves a practical balance between representation fairness and predictive accuracy, and its interpolation between baselines provides flexibility across datasets and popularity regimes.

Abstract

We study the fairness of dimensionality reduction methods for recommendations. We focus on the fundamental method of principal component analysis (PCA), which identifies latent components and produces a low-rank approximation via the leading components while discarding the trailing components. Prior works have defined notions of "fair PCA"; however, these definitions do not answer the following question: why is PCA unfair? We identify two underlying popularity mechanisms that induce item unfairness in PCA. The first negatively impacts less popular items because less popular items rely on trailing latent components to recover their values. The second negatively impacts highly popular items, since the leading PCA components specialize in individual popular items instead of capturing similarities between items. To address these issues, we develop a polynomial-time algorithm, Item-Weighted PCA, that flexibly up-weights less popular items when optimizing for leading principal components. We theoretically show that PCA, in all cases, and Normalized PCA, in cases of block-diagonal matrices, are instances of Item-Weighted PCA. We empirically show that there exist datasets for which Item-Weighted PCA yields the optimal solution while the baselines do not. In contrast to past dimensionality reduction re-weighting techniques, Item-Weighted PCA solves a convex optimization problem and enforces a hard rank constraint. Our evaluations on real-world datasets show that Item-Weighted PCA not only mitigates both unfairness mechanisms, but also produces recommendations that outperform those of PCA baselines.
Paper Structure (50 sections, 5 theorems, 9 equations, 8 figures, 4 tables)

This paper contains 50 sections, 5 theorems, 9 equations, 8 figures, 4 tables.

Key Result

Theorem 1

Let $P_n \in \mathbb{R}^{m_n \times m_n}$ be the projection matrix given by performing PCA on matrix $X_n$, taking the largest $M_n$ principal components. Then, $||P_n - I_{n, M_n}||_F \to 0$ as $n \to \infty$.

Figures (8)

  • Figure 1: To show the unfairness of PCA for recommendations, we run PCA on the LastFM dataset. Subfigure (A) shows the normalized item error as a function of the rank for eight different artists, as well as the overall error in the dotted line. While on average the PCA approximation exhibits diminishing returns as the rank increases, for individual items, specific components are critical for improving approximation quality. Subfigure (B) shows the first unfairness mechanism: less popular items rely on trailing principal components. The plot shows that high-popularity artists require fewer components to halve the initial approximation error while less-popular artists rely on trailing components. Subfigure (C) shows the second unfairness mechanism: PCA components specialize in individual items as opposed to collaborating across items. The diagonal values of PCA projection matrices indicate the degree of specialization.
  • Figure 2: Item-Weighted PCA reduces the unfairness mechanisms identified in Vanilla PCA in which leading components specialize in individual popular items. The left plots show that Item-Weighted PCA reduces our specialization metric relative to Vanilla PCA, and the right plots show the diagonal entries for high-popularity items, in particular, decreases. As expected, Item-Weighted PCA interpolates between Vanilla and Normalized PCA in our in-sample evaluation.
  • Figure 3: Compared to Vanilla PCA, Item-Weighted PCA also increases collaboration according to our in-sample evaluation metrics of Item AUC-ROC and Precision@k. The metrics report whether $P$ contains useful item-item similarities for recovering user preferences. As in \ref{['fig:item-pref-diag']}, the in-sample performance for Item-Weighted PCA is sandwiched between the two baselines since our method interpolates between the two. In the high-rank regime, collaboration dramatically decreases because the projection matrix can afford to specialize, and off-diagonal entries of $P$, the collaborative entries, approach zero.
  • Figure 4: On standard out-of-sample user recommendation metrics, Item-Weighted PCA improves peak (among all evaluated values of $r$) recommendation performance on Recall@$20$ and NDCG@$20$. While Item-Weighted PCA demonstrates less in-sample collaboration than Normalized PCA, the downstream performance is stronger, especially in MovieLens. This result demonstrates that Item-Weighted PCA balances in-sample collaboration with downstream performance. Results for Precision@$k$ and MRR@$k$ are shown in \ref{['sec:all_metrics']}, and a comparison with additional baselines is included in \ref{['sec:all_baselines']}.
  • Figure 5: The line corresponds to the average of the smallest squared singular value of the random Bernoulli matrix $X$. The shaded region corresponds to the 1 and 99th percentiles.
  • ...and 3 more figures

Theorems & Definitions (11)

  • Definition 1
  • Remark 1
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Claim 4
  • Claim 5
  • Theorem 6
  • Theorem 7
  • Claim 8
  • ...and 1 more