Table of Contents
Fetching ...

Making Alice Appear Like Bob: A Probabilistic Preference Obfuscation Method For Implicit Feedback Recommendation Models

Gustavo Escobedo, Marta Moscati, Peter Muellner, Simone Kopeinik, Dominik Kowald, Elisabeth Lex, Markus Schedl

TL;DR

The paper tackles privacy leakage in implicit-feedback recommender systems where user interactions correlate with protected attributes. It introduces Stereotypicality-Based Obfuscation (SBO), a probabilistic method that reduces item stereotypicality by selectively obfuscating highly stereotype-associated items in user profiles using item- and user-level metrics IGI and I_Ster, with obfuscation controlled by a ratio $\rho$ and sampling guided by Bernoulli trials. SBO is evaluated across three recommender models (BPR-MF, LightGCN, MultVAE) on MovieLens-1M and Last.fm-2b-100k, showing improved privacy (lower attacker accuracy) with only modest drops in utility (NDCG@10), and often outperforming a state-of-the-art obfuscation method Perblur. The work demonstrates that focusing obfuscation on the conjunction of profile items via stereotypicality metrics yields favorable privacy-utility trade-offs, with potential for extensions to more protected attributes and to addressing membership inference tasks in the future.

Abstract

Users' interaction or preference data used in recommender systems carry the risk of unintentionally revealing users' private attributes (e.g., gender or race). This risk becomes particularly concerning when the training data contains user preferences that can be used to infer these attributes, especially if they align with common stereotypes. This major privacy issue allows malicious attackers or other third parties to infer users' protected attributes. Previous efforts to address this issue have added or removed parts of users' preferences prior to or during model training to improve privacy, which often leads to decreases in recommendation accuracy. In this work, we introduce SBO, a novel probabilistic obfuscation method for user preference data designed to improve the accuracy--privacy trade-off for such recommendation scenarios. We apply SBO to three state-of-the-art recommendation models (i.e., BPR, MultVAE, and LightGCN) and two popular datasets (i.e., MovieLens-1M and LFM-2B). Our experiments reveal that SBO outperforms comparable approaches with respect to the accuracy--privacy trade-off. Specifically, we can reduce the leakage of users' protected attributes while maintaining on-par recommendation accuracy.

Making Alice Appear Like Bob: A Probabilistic Preference Obfuscation Method For Implicit Feedback Recommendation Models

TL;DR

The paper tackles privacy leakage in implicit-feedback recommender systems where user interactions correlate with protected attributes. It introduces Stereotypicality-Based Obfuscation (SBO), a probabilistic method that reduces item stereotypicality by selectively obfuscating highly stereotype-associated items in user profiles using item- and user-level metrics IGI and I_Ster, with obfuscation controlled by a ratio and sampling guided by Bernoulli trials. SBO is evaluated across three recommender models (BPR-MF, LightGCN, MultVAE) on MovieLens-1M and Last.fm-2b-100k, showing improved privacy (lower attacker accuracy) with only modest drops in utility (NDCG@10), and often outperforming a state-of-the-art obfuscation method Perblur. The work demonstrates that focusing obfuscation on the conjunction of profile items via stereotypicality metrics yields favorable privacy-utility trade-offs, with potential for extensions to more protected attributes and to addressing membership inference tasks in the future.

Abstract

Users' interaction or preference data used in recommender systems carry the risk of unintentionally revealing users' private attributes (e.g., gender or race). This risk becomes particularly concerning when the training data contains user preferences that can be used to infer these attributes, especially if they align with common stereotypes. This major privacy issue allows malicious attackers or other third parties to infer users' protected attributes. Previous efforts to address this issue have added or removed parts of users' preferences prior to or during model training to improve privacy, which often leads to decreases in recommendation accuracy. In this work, we introduce SBO, a novel probabilistic obfuscation method for user preference data designed to improve the accuracy--privacy trade-off for such recommendation scenarios. We apply SBO to three state-of-the-art recommendation models (i.e., BPR, MultVAE, and LightGCN) and two popular datasets (i.e., MovieLens-1M and LFM-2B). Our experiments reveal that SBO outperforms comparable approaches with respect to the accuracy--privacy trade-off. Specifically, we can reduce the leakage of users' protected attributes while maintaining on-par recommendation accuracy.
Paper Structure (23 sections, 2 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 23 sections, 2 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Distribution of item stereotypicality $I_\text{Ster}$$(v,U_g,U_{g'})$ with $U_g=U_m$ and $U_{g'}=U_f$ over the items of the LFM-2b-100k (left) and Ml-1m (right) datasets.
  • Figure 2: User group stereotypicality of users from the LFM-2b-100k and Ml-1m datasets, with users in order of descending stereotypicality. The red dotted and green dotted lines indicate the selection threshold $U_\text{Ster}^\text{mean}$ used for LFM-2b-100k and Ml-1m, respectively.
  • Figure 3: Performance of the RSs and attacker (NDCG$@10$ and BAcc) with different obfuscation strategies on (a) Ml-1m and (b) LFM-2b-100k. The dotted lines indicate the performances on the datasets without any obfuscation in place.
  • Figure 4: Performance of the RSs and attacker (NDCG$@10$ and BAcc) with different sampling methods on (a) Ml-1m and (b) LFM-2b-100k. The dotted lines indicate the performances on the datasets without any obfuscation in place.
  • Figure 5: Performance of the RSs and attacker (NDCG$@10$ and BAcc) with different obfuscation strategies on (a) Ml-1m and (b) LFM-2b-100k using the obfuscation ratio $\rho$$=0.05$. The dotted lines indicate the performances on the datasets without any obfuscation in place.
  • ...and 1 more figures