Inferring Communities of Interest in Collaborative Learning-based Recommender Systems
Yacine Belal, Sonia Ben Mokhtar, Mohamed Maouche, Anthony Simonet-Boulogne
TL;DR
The paper investigates privacy risks in collaborative-learning-based recommender systems by introducing Community Inference Attack (CIA), a low-cost, comparison-based attack that infers communities of users sharing a target item set. CIA operates in both Federated Recommender Systems (FedRecs) and Gossip Learning-based Recommender Systems (GossipRecs), achieving up to 10x random-guess accuracy in FL and around 3x in GL, without training surrogate models. It evaluates two defenses—Share less and Differentially Private SGD (DP-SGD)—finding that Share less generally improves privacy-utility trade-offs in FedRecs, while it can be counterproductive in GossipRecs due to model aging effects; DP-SGD offers formal privacy at the expense of utility. The results highlight substantial privacy leakage in distributed recommender systems and provide guidance for defense design, including the relative value of Share less over DP-SGD and the potential need for novel protections in decentralized settings.
Abstract
Collaborative-learning-based recommender systems, such as those employing Federated Learning (FL) and Gossip Learning (GL), allow users to train models while keeping their history of liked items on their devices. While these methods were seen as promising for enhancing privacy, recent research has shown that collaborative learning can be vulnerable to various privacy attacks. In this paper, we propose a novel attack called Community Inference Attack (CIA), which enables an adversary to identify community members based on a set of target items. What sets CIA apart is its efficiency: it operates at low computational cost by eliminating the need for training surrogate models. Instead, it uses a comparison-based approach, inferring sensitive information by comparing users' models rather than targeting any specific individual model. To evaluate the effectiveness of CIA, we conduct experiments on three real-world recommendation datasets using two recommendation models under both Federated and Gossip-like settings. The results demonstrate that CIA can be up to 10 times more accurate than random guessing. Additionally, we evaluate two mitigation strategies: Differentially Private Stochastic Gradient Descent (DP-SGD) and a Share less policy, which involves sharing fewer, less sensitive model parameters. Our findings suggest that the Share less strategy offers a better privacy-utility trade-off, especially in GL.
