Online federated learning framework for classification
Wenxing Guo, Jinhan Xie, Jianya Lu, Bei jiang, Hongsheng Dai, Linglong Kong
TL;DR
The paper tackles online federated learning for streaming classification across multiple privacy-sensitive clients. It introduces a generalized distance-weighted discriminant (DWD) framework combined with a Majorization-Minimization–renewable estimation scheme to enable incremental updates using only summary statistics, with strong theoretical guarantees of consistency, asymptotic normality, and Bayesian risk convergence. Differential privacy is embedded via noise-perturbed updates, yielding two algorithms (with and without DP) and proving DP guarantees alongside statistical properties. Empirical results on simulated and real data demonstrate competitive classification accuracy, substantial computational efficiency gains, and reduced storage requirements, highlighting the approach's effectiveness in non-IID, streaming, and privacy-constrained FL environments.
Abstract
In this paper, we develop a novel online federated learning framework for classification, designed to handle streaming data from multiple clients while ensuring data privacy and computational efficiency. Our method leverages the generalized distance-weighted discriminant technique, making it robust to both homogeneous and heterogeneous data distributions across clients. In particular, we develop a new optimization algorithm based on the Majorization-Minimization principle, integrated with a renewable estimation procedure, enabling efficient model updates without full retraining. We provide a theoretical guarantee for the convergence of our estimator, proving its consistency and asymptotic normality under standard regularity conditions. In addition, we establish that our method achieves Bayesian risk consistency, ensuring its reliability for classification tasks in federated environments. We further incorporate differential privacy mechanisms to enhance data security, protecting client information while maintaining model performance. Extensive numerical experiments on both simulated and real-world datasets demonstrate that our approach delivers high classification accuracy, significant computational efficiency gains, and substantial savings in data storage requirements compared to existing methods.
