Table of Contents
Fetching ...

Online federated learning framework for classification

Wenxing Guo, Jinhan Xie, Jianya Lu, Bei jiang, Hongsheng Dai, Linglong Kong

TL;DR

The paper tackles online federated learning for streaming classification across multiple privacy-sensitive clients. It introduces a generalized distance-weighted discriminant (DWD) framework combined with a Majorization-Minimization–renewable estimation scheme to enable incremental updates using only summary statistics, with strong theoretical guarantees of consistency, asymptotic normality, and Bayesian risk convergence. Differential privacy is embedded via noise-perturbed updates, yielding two algorithms (with and without DP) and proving DP guarantees alongside statistical properties. Empirical results on simulated and real data demonstrate competitive classification accuracy, substantial computational efficiency gains, and reduced storage requirements, highlighting the approach's effectiveness in non-IID, streaming, and privacy-constrained FL environments.

Abstract

In this paper, we develop a novel online federated learning framework for classification, designed to handle streaming data from multiple clients while ensuring data privacy and computational efficiency. Our method leverages the generalized distance-weighted discriminant technique, making it robust to both homogeneous and heterogeneous data distributions across clients. In particular, we develop a new optimization algorithm based on the Majorization-Minimization principle, integrated with a renewable estimation procedure, enabling efficient model updates without full retraining. We provide a theoretical guarantee for the convergence of our estimator, proving its consistency and asymptotic normality under standard regularity conditions. In addition, we establish that our method achieves Bayesian risk consistency, ensuring its reliability for classification tasks in federated environments. We further incorporate differential privacy mechanisms to enhance data security, protecting client information while maintaining model performance. Extensive numerical experiments on both simulated and real-world datasets demonstrate that our approach delivers high classification accuracy, significant computational efficiency gains, and substantial savings in data storage requirements compared to existing methods.

Online federated learning framework for classification

TL;DR

The paper tackles online federated learning for streaming classification across multiple privacy-sensitive clients. It introduces a generalized distance-weighted discriminant (DWD) framework combined with a Majorization-Minimization–renewable estimation scheme to enable incremental updates using only summary statistics, with strong theoretical guarantees of consistency, asymptotic normality, and Bayesian risk convergence. Differential privacy is embedded via noise-perturbed updates, yielding two algorithms (with and without DP) and proving DP guarantees alongside statistical properties. Empirical results on simulated and real data demonstrate competitive classification accuracy, substantial computational efficiency gains, and reduced storage requirements, highlighting the approach's effectiveness in non-IID, streaming, and privacy-constrained FL environments.

Abstract

In this paper, we develop a novel online federated learning framework for classification, designed to handle streaming data from multiple clients while ensuring data privacy and computational efficiency. Our method leverages the generalized distance-weighted discriminant technique, making it robust to both homogeneous and heterogeneous data distributions across clients. In particular, we develop a new optimization algorithm based on the Majorization-Minimization principle, integrated with a renewable estimation procedure, enabling efficient model updates without full retraining. We provide a theoretical guarantee for the convergence of our estimator, proving its consistency and asymptotic normality under standard regularity conditions. In addition, we establish that our method achieves Bayesian risk consistency, ensuring its reliability for classification tasks in federated environments. We further incorporate differential privacy mechanisms to enhance data security, protecting client information while maintaining model performance. Extensive numerical experiments on both simulated and real-world datasets demonstrate that our approach delivers high classification accuracy, significant computational efficiency gains, and substantial savings in data storage requirements compared to existing methods.

Paper Structure

This paper contains 10 sections, 1 theorem, 133 equations, 10 figures, 4 tables.

Key Result

Theorem 1

The iterative algorithm D-5 converges to $\bm\theta^*$ in probability.

Figures (10)

  • Figure 1: A diagrammatic representation of online and federated learning frameworks
  • Figure 2: Comparison of $V_q'(u)$ and Smoothed $V_q'(u)$
  • Figure 3: Comparison of accuracy and time cost for the balanced dataset (two-class data, 1:1 ratio) based on variations in parameter $b$, with $M$=20, $p$=50, $\mu$=0.2, $\sigma$=1, $q$=1, $\epsilon=0.8$ and $\delta=10^{-5}$.
  • Figure 4: Comparison of accuracy and time cost for the balanced dataset (two-class data, 1:1 ratio) based on variations in parameter $M$, with $b$=100, $p$=50, $\mu$=0.2, $\sigma$=1, $q$=1, $\epsilon=0.8$ and $\delta=10^{-5}$.
  • Figure 5: Comparison of accuracy and time cost for the imbalanced dataset (two-class data, 4:1 ratio) based on variations in parameter $\mu$, with $M$=20, $p$=50, $b$=100, $\sigma$=1, $q$=1, $\epsilon=0.8$ and $\delta=10^{-5}$.
  • ...and 5 more figures

Theorems & Definitions (6)

  • Theorem 1
  • proof
  • proof
  • proof
  • proof
  • proof