Privacy-Utility-Bias Trade-offs for Privacy-Preserving Recommender Systems
Shiva Parsarad, Isabel Wagner
TL;DR
This study tackles how differential privacy affects utility and bias in recommender systems by systematically evaluating two DP mechanisms (DPSGD and LDP) across four model families (SVD, BPR, NCF, VAE) on MovieLens-1M and Yelp. It demonstrates that DP’s impact is highly model- and data-dependent, with NCF often maintaining strong utility under DPSGD while VAE is most vulnerable to privacy noise in sparse settings. The work quantifies bias across multiple metrics (miscalibration, popularity bias, novelty, coverage, and producer fairness) and reveals nuanced privacy–bias dynamics, including conditions under which privacy can reduce head-tail disparities or, conversely, preserve baseline biases. Practically, the results suggest selecting DP configurations and model architectures by aligning privacy, utility, and fairness priorities, and point to avenues like calibration-focused post-processing and user-level DP for more robust deployments.
Abstract
Recommender systems (RSs) output ranked lists of items, such as movies or restaurants, that users may find interesting, based on the user's past ratings and ratings from other users. RSs increasingly incorporate differential privacy (DP) to protect user data, raising questions about how privacy mechanisms affect both recommendation accuracy and fairness. We conduct a comprehensive, cross-model evaluation of two DP mechanisms, differentially private stochastic gradient descent (DPSGD) and local differential privacy (LDP), applied to four recommender systems (Neural Collaborative Filtering (NCF), Bayesian Personalized Ranking (BPR), Singular Value Decomposition (SVD), and Variational Autoencoder (VAE)) on the MovieLens-1M and Yelp datasets. We find that stronger privacy consistently reduces utility, but not uniformly. NCF under DPSGD shows the smallest accuracy loss (under 10 percent at epsilon approximately 1), whereas SVD and BPR experience larger drops, especially for users with niche preferences. VAE is the most sensitive to privacy, with sharp declines for sparsely represented groups. The impact on bias metrics is similarly heterogeneous. DPSGD generally reduces the gap between recommendations of popular and less popular items, whereas LDP preserves existing patterns more closely. These results highlight that no single DP mechanism is uniformly superior; instead, each provides trade-offs under different privacy regimes and data conditions.
