Table of Contents
Fetching ...

RAID: An In-Training Defense against Attribute Inference Attacks in Recommender Systems

Xiaohua Feng, Yuyuan Li, Fengyuan Yu, Ke Xiong, Junjie Fang, Li Zhang, Tianyu Du, Chaochao Chen

TL;DR

Attribute inference attacks threaten user privacy in recommender systems by exploiting exposed embeddings. RAID addresses this with an in-training defense that defines a defensive objective to render protected-attribute distributions indistinguishable by steering them toward a centroid distribution $\\mathcal{P}^*$ via a constrained Wasserstein-2 barycenter and optimal transport, all within a two-phase training framework that preserves recommendation quality. Empirically, RAID outperforms post-training baselines and adversarial methods across four real-world datasets and multiple models, while offering stability and efficiency advantages; ablation and robustness analyses confirm the necessity of both the defense and recommendation objectives. The approach yields practical privacy protection under gray-box attacks and can be extended to multi-attribute defenses and fairness considerations, with minimal impact on utility and improved convergence behavior.

Abstract

In various networks and mobile applications, users are highly susceptible to attribute inference attacks, with particularly prevalent occurrences in recommender systems. Attackers exploit partially exposed user profiles in recommendation models, such as user embeddings, to infer private attributes of target users, such as gender and political views. The goal of defenders is to mitigate the effectiveness of these attacks while maintaining recommendation performance. Most existing defense methods, such as differential privacy and attribute unlearning, focus on post-training settings, which limits their capability of utilizing training data to preserve recommendation performance. Although adversarial training extends defenses to in-training settings, it often struggles with convergence due to unstable training processes. In this paper, we propose RAID, an in-training defense method against attribute inference attacks in recommender systems. In addition to the recommendation objective, we define a defensive objective to ensure that the distribution of protected attributes becomes independent of class labels, making users indistinguishable from attribute inference attacks. Specifically, this defensive objective aims to solve a constrained Wasserstein barycenter problem to identify the centroid distribution that makes the attribute indistinguishable while complying with recommendation performance constraints. To optimize our proposed objective, we use optimal transport to align users with the centroid distribution. We conduct extensive experiments on four real-world datasets to evaluate RAID. The experimental results validate the effectiveness of RAID and demonstrate its significant superiority over existing methods in multiple aspects.

RAID: An In-Training Defense against Attribute Inference Attacks in Recommender Systems

TL;DR

Attribute inference attacks threaten user privacy in recommender systems by exploiting exposed embeddings. RAID addresses this with an in-training defense that defines a defensive objective to render protected-attribute distributions indistinguishable by steering them toward a centroid distribution via a constrained Wasserstein-2 barycenter and optimal transport, all within a two-phase training framework that preserves recommendation quality. Empirically, RAID outperforms post-training baselines and adversarial methods across four real-world datasets and multiple models, while offering stability and efficiency advantages; ablation and robustness analyses confirm the necessity of both the defense and recommendation objectives. The approach yields practical privacy protection under gray-box attacks and can be extended to multi-attribute defenses and fairness considerations, with minimal impact on utility and improved convergence behavior.

Abstract

In various networks and mobile applications, users are highly susceptible to attribute inference attacks, with particularly prevalent occurrences in recommender systems. Attackers exploit partially exposed user profiles in recommendation models, such as user embeddings, to infer private attributes of target users, such as gender and political views. The goal of defenders is to mitigate the effectiveness of these attacks while maintaining recommendation performance. Most existing defense methods, such as differential privacy and attribute unlearning, focus on post-training settings, which limits their capability of utilizing training data to preserve recommendation performance. Although adversarial training extends defenses to in-training settings, it often struggles with convergence due to unstable training processes. In this paper, we propose RAID, an in-training defense method against attribute inference attacks in recommender systems. In addition to the recommendation objective, we define a defensive objective to ensure that the distribution of protected attributes becomes independent of class labels, making users indistinguishable from attribute inference attacks. Specifically, this defensive objective aims to solve a constrained Wasserstein barycenter problem to identify the centroid distribution that makes the attribute indistinguishable while complying with recommendation performance constraints. To optimize our proposed objective, we use optimal transport to align users with the centroid distribution. We conduct extensive experiments on four real-world datasets to evaluate RAID. The experimental results validate the effectiveness of RAID and demonstrate its significant superiority over existing methods in multiple aspects.

Paper Structure

This paper contains 60 sections, 2 theorems, 33 equations, 6 figures, 12 tables, 1 algorithm.

Key Result

Lemma 1

Given sample-based empirical distributions $\hat{\mathcal{P}}(Y^i) = \frac{1}{N_i}\sum_{n=1}^{N_i} \delta_{y_n^i}, i\in [K]$, the dual problem of penalized Wasserstein barycenter can be formulated as where $g(\text{·})$ denotes the dual problem and $g^{c}_i(y_n^i)$ is defined by Moreover, Eq. (dual-problem) is concave and L-Lipschitz continuous with respect to $g_i$'s. If ${(g_i)}^n_{i=1}$ solve

Figures (6)

  • Figure 1: The difference between the settings of In-Training (InT) and Post-Training (PoT) primarily lies in their implementation stages and dependency on data. The InT setting is implemented during the model training phase, allowing it to protect user attributes while utilizing the original training data to preserve recommendation performance. In contrast, the PoT setting is implemented after the model training is completed, at which point only the model's parameters are accessible li2023making. The experimental results of the comparison are reported in Section \ref{['subsec:limit']}.
  • Figure 2: RAID, established under the InT setting, can directly use the original training data to preserve recommendation performance. Additionally, RAID defines a defensive objective to ensure that the class distribution of protected attributes is independent of class labels, which is equivalent to making all class distributions indistinguishable. Based on this, RAID first calculates a centroid distribution that satisfies the constraints on recommendation performance and then uses optimal transport to align all class distributions with it, rendering the class distributions indistinguishable.
  • Figure 3: The function of each constraint. (a) In the initial stage, distinct decision boundaries between class distributions provide potential attacking foundations. (b) When Constraint 1 is applied, distributions are correctly centered but too compact, leading to significant overlap among user embeddings and reducing their ability to distinguish information beyond protected attributes. (c) When Constraint 2 is applied, the centroids become overly relaxed and unstable, causing only slight movements of each class distribution. There remains a clear decision boundary between the class distributions.
  • Figure 4: Distribution of user embedding on NCF, where different colors denote different class values (e.g., male vs. female).
  • Figure 5: Effect of the hyper-parameter penalization coefficient $\eta$.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Lemma 1
  • proof
  • Lemma 2
  • proof