Federated Recommender System with Data Valuation for E-commerce Platform
Jongwon Park, Minku Kang, Wooseok Sim, Soyoung Lee, Hogun Park
TL;DR
The paper addresses privacy-preserving recommender systems by leveraging large-scale public global data in a federated setting. It introduces FedGDVE, a Graph Data Value Estimator that selectively augments each client's local user-item graph with globally available interactions that semantically align with the client's distribution, using a graph encoder, a valid predictor, and a reinforcement-learning–based probability estimator to filter data. Key contributions include the GDVE architecture, a two-stage training procedure with pre-training and RL-based data selection, and empirical evidence showing up to 34.86% improvements over strong FL baselines while maintaining privacy and efficiency. The approach enables scalable, personalized recommendations in heterogeneous, real-world e-commerce ecosystems, offering practical benefits for platforms coordinating many stores without sharing raw interaction data.
Abstract
Federated Learning (FL) is gaining prominence in machine learning as privacy concerns grow. This paradigm allows each client (e.g., an individual online store) to train a recommendation model locally while sharing only model updates, without exposing the raw interaction logs to a central server, thereby preserving privacy in a decentralized environment. Nonetheless, most existing FL-based recommender systems still rely solely on each client's private data, despite the abundance of publicly available datasets that could be leveraged to enrich local training; this potential remains largely underexplored. To this end, we consider a realistic scenario wherein a large shopping platform collaborates with multiple small online stores to build a global recommender system. The platform possesses global data, such as shareable user and item lists, while each store holds a portion of interaction data privately (or locally). Although integrating global data can help mitigate the limitations of sparse and biased clients' local data, it also introduces additional challenges: simply combining all global interactions can amplify noise and irrelevant patterns, worsening personalization and increasing computational costs. To address these challenges, we propose FedGDVE, which selectively augments each client's local graph with semantically aligned samples from the global dataset. FedGDVE employs: (i) a pre-trained graph encoder to extract global structural features, (ii) a local valid predictor to assess client-specific relevance, (iii) a reinforcement-learning-based probability estimator to filter and sample only the most pertinent global interactions. FedGDVE improves performance by up to 34.86% on recognized benchmarks in FL environments.
