RAID: An In-Training Defense against Attribute Inference Attacks in Recommender Systems

Xiaohua Feng; Yuyuan Li; Fengyuan Yu; Ke Xiong; Junjie Fang; Li Zhang; Tianyu Du; Chaochao Chen

RAID: An In-Training Defense against Attribute Inference Attacks in Recommender Systems

Xiaohua Feng, Yuyuan Li, Fengyuan Yu, Ke Xiong, Junjie Fang, Li Zhang, Tianyu Du, Chaochao Chen

TL;DR

Attribute inference attacks threaten user privacy in recommender systems by exploiting exposed embeddings. RAID addresses this with an in-training defense that defines a defensive objective to render protected-attribute distributions indistinguishable by steering them toward a centroid distribution $\\mathcal{P}^*$ via a constrained Wasserstein-2 barycenter and optimal transport, all within a two-phase training framework that preserves recommendation quality. Empirically, RAID outperforms post-training baselines and adversarial methods across four real-world datasets and multiple models, while offering stability and efficiency advantages; ablation and robustness analyses confirm the necessity of both the defense and recommendation objectives. The approach yields practical privacy protection under gray-box attacks and can be extended to multi-attribute defenses and fairness considerations, with minimal impact on utility and improved convergence behavior.

Abstract

In various networks and mobile applications, users are highly susceptible to attribute inference attacks, with particularly prevalent occurrences in recommender systems. Attackers exploit partially exposed user profiles in recommendation models, such as user embeddings, to infer private attributes of target users, such as gender and political views. The goal of defenders is to mitigate the effectiveness of these attacks while maintaining recommendation performance. Most existing defense methods, such as differential privacy and attribute unlearning, focus on post-training settings, which limits their capability of utilizing training data to preserve recommendation performance. Although adversarial training extends defenses to in-training settings, it often struggles with convergence due to unstable training processes. In this paper, we propose RAID, an in-training defense method against attribute inference attacks in recommender systems. In addition to the recommendation objective, we define a defensive objective to ensure that the distribution of protected attributes becomes independent of class labels, making users indistinguishable from attribute inference attacks. Specifically, this defensive objective aims to solve a constrained Wasserstein barycenter problem to identify the centroid distribution that makes the attribute indistinguishable while complying with recommendation performance constraints. To optimize our proposed objective, we use optimal transport to align users with the centroid distribution. We conduct extensive experiments on four real-world datasets to evaluate RAID. The experimental results validate the effectiveness of RAID and demonstrate its significant superiority over existing methods in multiple aspects.

RAID: An In-Training Defense against Attribute Inference Attacks in Recommender Systems

TL;DR

Abstract

RAID: An In-Training Defense against Attribute Inference Attacks in Recommender Systems

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (4)