Table of Contents
Fetching ...

A Privacy Preserving System for Movie Recommendations Using Federated Learning

David Neumann, Andreas Lutz, Karsten Müller, Wojciech Samek

TL;DR

This paper addresses the privacy risks of personalized movie recommendations by deploying a privacy-preserving federated recommender system. It introduces FedQ, a queue-based federated learning method that mitigates non-iid data effects and small local datasets, and couples it with neural-network compression (DeepCABAC) to dramatically reduce communication overhead. The authors demonstrate scalability to over 162,000 clients and show that FedQ consistently improves over FedAvg, while compression delivers substantial bandwidth savings with minimal accuracy loss for the candidate generator. The work advances practical, privacy-conscious recommender systems and points to future enhancements such as differential privacy integration and embedding learning under FL.

Abstract

Recommender systems have become ubiquitous in the past years. They solve the tyranny of choice problem faced by many users, and are utilized by many online businesses to drive engagement and sales. Besides other criticisms, like creating filter bubbles within social networks, recommender systems are often reproved for collecting considerable amounts of personal data. However, to personalize recommendations, personal information is fundamentally required. A recent distributed learning scheme called federated learning has made it possible to learn from personal user data without its central collection. Consequently, we present a recommender system for movie recommendations, which provides privacy and thus trustworthiness on multiple levels: First and foremost, it is trained using federated learning and thus, by its very nature, privacy-preserving, while still enabling users to benefit from global insights. Furthermore, a novel federated learning scheme, called FedQ, is employed, which not only addresses the problem of non-i.i.d.-ness and small local datasets, but also prevents input data reconstruction attacks by aggregating client updates early. Finally, to reduce the communication overhead, compression is applied, which significantly compresses the exchanged neural network parametrizations to a fraction of their original size. We conjecture that this may also improve data privacy through its lossy quantization stage.

A Privacy Preserving System for Movie Recommendations Using Federated Learning

TL;DR

This paper addresses the privacy risks of personalized movie recommendations by deploying a privacy-preserving federated recommender system. It introduces FedQ, a queue-based federated learning method that mitigates non-iid data effects and small local datasets, and couples it with neural-network compression (DeepCABAC) to dramatically reduce communication overhead. The authors demonstrate scalability to over 162,000 clients and show that FedQ consistently improves over FedAvg, while compression delivers substantial bandwidth savings with minimal accuracy loss for the candidate generator. The work advances practical, privacy-conscious recommender systems and points to future enhancements such as differential privacy integration and embedding learning under FL.

Abstract

Recommender systems have become ubiquitous in the past years. They solve the tyranny of choice problem faced by many users, and are utilized by many online businesses to drive engagement and sales. Besides other criticisms, like creating filter bubbles within social networks, recommender systems are often reproved for collecting considerable amounts of personal data. However, to personalize recommendations, personal information is fundamentally required. A recent distributed learning scheme called federated learning has made it possible to learn from personal user data without its central collection. Consequently, we present a recommender system for movie recommendations, which provides privacy and thus trustworthiness on multiple levels: First and foremost, it is trained using federated learning and thus, by its very nature, privacy-preserving, while still enabling users to benefit from global insights. Furthermore, a novel federated learning scheme, called FedQ, is employed, which not only addresses the problem of non-i.i.d.-ness and small local datasets, but also prevents input data reconstruction attacks by aggregating client updates early. Finally, to reduce the communication overhead, compression is applied, which significantly compresses the exchanged neural network parametrizations to a fraction of their original size. We conjecture that this may also improve data privacy through its lossy quantization stage.
Paper Structure (38 sections, 3 equations, 23 figures, 8 tables, 3 algorithms)

This paper contains 38 sections, 3 equations, 23 figures, 8 tables, 3 algorithms.

Figures (23)

  • Figure 1: Flow diagram of the "funnel-like" three-stage recsys architecture of the proposed recsys, consisting of candidate generation, ranking, and re-ranking stages (inspired by Figure 2 in bibliography:dnns-for-youtube-recommendations).
  • Figure 2: dnn candidate generator model architecture of the recsys bibliography:dnns-for-youtube-recommendations.
  • Figure 3: dnn ranker model architecture of the recsys.
  • Figure 4: The typical fedavg scenario with a central coordinating server and several clients with their local data. The central server sends a global model to the clients, which then perform training on local data. The resulting updated local models are sent back to the central server, which aggregates them into a new global model by averaging the model weights.
  • Figure 5: In-depth analysis histograms of the MovieLens 25M dataset: (a) average times between ratings of all users in the dataset, (b) number of ratings per user, (c) number of ratings per movie, and (d) number of ratings of a specific value that were cast by the users.
  • ...and 18 more figures