When Collaborative Filtering is not Collaborative: Unfairness of PCA for Recommendations
David Liu, Jackie Baek, Tina Eliassi-Rad
TL;DR
The paper investigates fairness in principal component analysis (PCA) when applied to collaborative filtering for recommendations, revealing two item-level unfairness mechanisms: (i) less popular items rely on trailing PCA components, reducing their recovery, and (ii) leading components can become overly specialized on individual popular items, limiting true collaboration. To address these issues, the authors propose Item-Weighted PCA, a convex semi-definite program that up-weights less popular items via weights $w_j = p_j^{\gamma}$ and enforces a hard rank constraint, with Vanilla PCA and Normalized PCA shown as special cases for certain matrix structures. The approach yields theoretical guarantees in stylized matrix classes and demonstrates through experiments on LastFM and MovieLens that Item-Weighted PCA mitigates the two unfairness mechanisms while achieving competitive or superior downstream recommendation performance compared to PCA baselines. Overall, Item-Weighted PCA offers a principled, convex mechanism to balance popularity effects in latent representations, improving both fairness in item-level learning and user-centric recommendations. The method achieves a practical balance between representation fairness and predictive accuracy, and its interpolation between baselines provides flexibility across datasets and popularity regimes.
Abstract
We study the fairness of dimensionality reduction methods for recommendations. We focus on the fundamental method of principal component analysis (PCA), which identifies latent components and produces a low-rank approximation via the leading components while discarding the trailing components. Prior works have defined notions of "fair PCA"; however, these definitions do not answer the following question: why is PCA unfair? We identify two underlying popularity mechanisms that induce item unfairness in PCA. The first negatively impacts less popular items because less popular items rely on trailing latent components to recover their values. The second negatively impacts highly popular items, since the leading PCA components specialize in individual popular items instead of capturing similarities between items. To address these issues, we develop a polynomial-time algorithm, Item-Weighted PCA, that flexibly up-weights less popular items when optimizing for leading principal components. We theoretically show that PCA, in all cases, and Normalized PCA, in cases of block-diagonal matrices, are instances of Item-Weighted PCA. We empirically show that there exist datasets for which Item-Weighted PCA yields the optimal solution while the baselines do not. In contrast to past dimensionality reduction re-weighting techniques, Item-Weighted PCA solves a convex optimization problem and enforces a hard rank constraint. Our evaluations on real-world datasets show that Item-Weighted PCA not only mitigates both unfairness mechanisms, but also produces recommendations that outperform those of PCA baselines.
