PA-CFL: Privacy-Adaptive Clustered Federated Learning for Transformer-Based Sales Forecasting on Heterogeneous Retail Data
Yunbo Long, Liming Xu, Ge Zheng, Alexandra Brintrup
TL;DR
PA-CFL tackles the dual challenge of data heterogeneity and privacy in cross-region retail demand forecasting by clustering participants into privacy-preserving bubbles using differentially private feature-importance distributions, then conducting Transformer-based federated learning within each bubble. The framework uses Earth Mover's Distance clustering and Davies-Bouldin Index to determine the optimal bubble count, while its single-client bubbles are flagged as potential attackers to enhance robustness. Within each bubble, a Sales Transformer is trained via FedAvg to forecast demand, yielding significant improvements over local training and FedAvg across diverse regions, and demonstrating robustness to varying privacy budgets. The approach also supports dynamic adaptation to data noise and participant participation, and includes mechanisms for detecting and mitigating poisoned inputs, offering a scalable, privacy-preserving template for federated time-series forecasting in heterogeneous retail networks.
Abstract
Federated learning (FL) enables retailers to share model parameters for demand forecasting while maintaining privacy. However, heterogeneous data across diverse regions, driven by factors such as varying consumer behavior, poses challenges to the effectiveness of federated learning. To tackle this challenge, we propose Privacy-Adaptive Clustered Federated Learning (PA-CFL) tailored for demand forecasting on heterogeneous retail data. By leveraging differential privacy and feature importance distribution, PA-CFL groups retailers into distinct ``bubbles'', each forming its own federated learning system to effectively isolate data heterogeneity. Within each bubble, Transformer models are designed to predict local sales for each client. Our experiments demonstrate that PA-CFL significantly surpasses FedAvg and outperforms local learning in demand forecasting performance across all participating clients. Compared to local learning, PA-CFL achieves a 5.4% improvement in R^2, a 69% reduction in RMSE, and a 45% decrease in MAE. Our approach enables effective FL through adaptive adjustments to diverse noise levels and the range of clients participating in each bubble. By grouping participants and proactively filtering out high-risk clients, PA-CFL mitigates potential threats to the FL system. The findings demonstrate PA-CFL's ability to enhance federated learning in time series prediction tasks with heterogeneous data, achieving a balance between forecasting accuracy and privacy preservation in retail applications. Additionally, PA-CFL's capability to detect and neutralize poisoned data from clients enhances the system's robustness and reliability.
