Table of Contents
Fetching ...

Federated Collaborative Filtering for Privacy-Preserving Personalized Recommendation System

Muhammad Ammad-ud-din, Elena Ivannikova, Suleiman A. Khan, Were Oyomno, Qiang Fu, Kuan Eeik Tan, Adrian Flanagan

TL;DR

This paper introduces Federated Collaborative Filtering (FCF), a privacy-preserving extension of classic collaborative filtering that keeps user data on-device while updating the global item-factor model on a central server. By updating user factors locally and aggregating gradients to refine item factors, the approach maintains CF performance on implicit feedback tasks with convergence stabilized by Adam optimization. Experiments on simulated, MovieLens, and in-house data show negligible accuracy loss compared to centralized CF, highlighting the privacy benefits without sacrificing recommender quality. The work lays groundwork for privacy-aware personalized recommendations and outlines avenues for online learning, communication efficiency, and security enhancements in federated recommender systems.

Abstract

The increasing interest in user privacy is leading to new privacy preserving machine learning paradigms. In the Federated Learning paradigm, a master machine learning model is distributed to user clients, the clients use their locally stored data and model for both inference and calculating model updates. The model updates are sent back and aggregated on the server to update the master model then redistributed to the clients. In this paradigm, the user data never leaves the client, greatly enhancing the user' privacy, in contrast to the traditional paradigm of collecting, storing and processing user data on a backend server beyond the user's control. In this paper we introduce, as far as we are aware, the first federated implementation of a Collaborative Filter. The federated updates to the model are based on a stochastic gradient approach. As a classical case study in machine learning, we explore a personalized recommendation system based on users' implicit feedback and demonstrate the method's applicability to both the MovieLens and an in-house dataset. Empirical validation confirms a collaborative filter can be federated without a loss of accuracy compared to a standard implementation, hence enhancing the user's privacy in a widely used recommender application while maintaining recommender performance.

Federated Collaborative Filtering for Privacy-Preserving Personalized Recommendation System

TL;DR

This paper introduces Federated Collaborative Filtering (FCF), a privacy-preserving extension of classic collaborative filtering that keeps user data on-device while updating the global item-factor model on a central server. By updating user factors locally and aggregating gradients to refine item factors, the approach maintains CF performance on implicit feedback tasks with convergence stabilized by Adam optimization. Experiments on simulated, MovieLens, and in-house data show negligible accuracy loss compared to centralized CF, highlighting the privacy benefits without sacrificing recommender quality. The work lays groundwork for privacy-aware personalized recommendations and outlines avenues for online learning, communication efficiency, and security enhancements in federated recommender systems.

Abstract

The increasing interest in user privacy is leading to new privacy preserving machine learning paradigms. In the Federated Learning paradigm, a master machine learning model is distributed to user clients, the clients use their locally stored data and model for both inference and calculating model updates. The model updates are sent back and aggregated on the server to update the master model then redistributed to the clients. In this paradigm, the user data never leaves the client, greatly enhancing the user' privacy, in contrast to the traditional paradigm of collecting, storing and processing user data on a backend server beyond the user's control. In this paper we introduce, as far as we are aware, the first federated implementation of a Collaborative Filter. The federated updates to the model are based on a stochastic gradient approach. As a classical case study in machine learning, we explore a personalized recommendation system based on users' implicit feedback and demonstrate the method's applicability to both the MovieLens and an in-house dataset. Empirical validation confirms a collaborative filter can be federated without a loss of accuracy compared to a standard implementation, hence enhancing the user's privacy in a widely used recommender application while maintaining recommender performance.

Paper Structure

This paper contains 18 sections, 19 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Collaborative Filtering in the Federated Learning Paradigm. The Master Model $\textbf{Y}$ (item-factor matrix) is updated on the server and then distributed to the clients. Each user-specific model $\textbf{X}$ (user-factor matrix) remains on the local client, and is updated on the client using the local user data and $\textbf{Y}$ from the server. The updates through the gradients of $\textbf{Y}$ are computed on each client and transmitted to the server where they are aggregated to update the master model $\textbf{Y}$.
  • Figure 2: Convergence analysis of FCF model. The y-axis represents the $\%$ difference between the elements of latent factors $\textbf{Y}_{\text{FCF}}$ to $\textbf{Y}_{\text{CF}}$ whereas the x-axis shows iterations. With 20 gradient descent iterations per epoch the vertical lines indicate the start and end of an epoch. Top-Left, $\alpha = 1, \gamma = 0.05$, epoch $= 1$; Top-Middle, $\alpha = 2-6, \gamma = 0.05$, epoch $= 1$; Top-Right, $\alpha = 2-6, \gamma = 0.025$, epoch $= 2$; Bottom-Left, $\alpha = 10, \gamma = 0.05$, epoch $= 20$; Bottom-Middle, $\alpha = 1-1000, \gamma = 0.2$, epoch $= 1$ using Adam adaptive learning rate; Bottom-Right, $\alpha = 10, \gamma = 0.2$, epoch $= 20$ using Adam adaptive learning rate.
  • Figure 3: Comparisons between Collaborative Filter (CF) and Federated Collaborative Filter (FCF) in form of posterior distributions drawn from a correlated Bayesian t-test, on MovieLens (top row) and in-house production (bottom row) datasets. The various performance metrics, Precision, Recall, F1, MAP and RMSE are shown in columns. The vertical lines (rope) define a region of practical equivalence where the mean difference in performance is no more than $\pm$0.5%. The area under this distribution in the interval [-0.005, 0.005] is 0.999 confirming that the performance of two models is statistically similar.
  • Figure :
  • Figure :