Table of Contents
Fetching ...

Fractional-Order Federated Learning

Mohammad Partohaghighi, Roummel Marcia, YangQuan Chen

TL;DR

FOFedAvg addresses federated learning challenges under non-IID data by injecting memory into local updates through a Caputo-type fractional derivative of order $0<\alpha\le 1$, yielding memory-aware, long-range gradient updates. The approach preserves FedAvg’s communication pattern while achieving faster convergence and greater stability, supported by a convergence analysis that shows accumulation at stationary points under standard $L$-smoothness and bounded-variance assumptions. Empirical results across nine diverse datasets demonstrate competitive or superior performance relative to strong FL baselines, with notable communication-efficiency gains in non-IID regimes. The work highlights memory-based optimization as a practical path for robust distributed learning in heterogeneous environments, and outlines future directions for adaptive fractional orders and privacy integrations.

Abstract

Federated learning (FL) allows remote clients to train a global model collaboratively while protecting client privacy. Despite its privacy-preserving benefits, FL has significant drawbacks, including slow convergence, high communication cost, and non-independent-and-identically-distributed (non-IID) data. In this work, we present a novel FedAvg variation called Fractional-Order Federated Averaging (FOFedAvg), which incorporates Fractional-Order Stochastic Gradient Descent (FOSGD) to capture long-range relationships and deeper historical information. By introducing memory-aware fractional-order updates, FOFedAvg improves communication efficiency and accelerates convergence while mitigating instability caused by heterogeneous, non-IID client data. We compare FOFedAvg against a broad set of established federated optimization algorithms on benchmark datasets including MNIST, FEMNIST, CIFAR-10, CIFAR-100, EMNIST, the Cleveland heart disease dataset, Sent140, PneumoniaMNIST, and Edge-IIoTset. Across a range of non-IID partitioning schemes, FOFedAvg is competitive with, and often outperforms, these baselines in terms of test performance and convergence speed. On the theoretical side, we prove that FOFedAvg converges to a stationary point under standard smoothness and bounded-variance assumptions for fractional order $0<α\le 1$. Together, these results show that fractional-order, memory-aware updates can substantially improve the robustness and effectiveness of federated learning, offering a practical path toward distributed training on heterogeneous data.

Fractional-Order Federated Learning

TL;DR

FOFedAvg addresses federated learning challenges under non-IID data by injecting memory into local updates through a Caputo-type fractional derivative of order , yielding memory-aware, long-range gradient updates. The approach preserves FedAvg’s communication pattern while achieving faster convergence and greater stability, supported by a convergence analysis that shows accumulation at stationary points under standard -smoothness and bounded-variance assumptions. Empirical results across nine diverse datasets demonstrate competitive or superior performance relative to strong FL baselines, with notable communication-efficiency gains in non-IID regimes. The work highlights memory-based optimization as a practical path for robust distributed learning in heterogeneous environments, and outlines future directions for adaptive fractional orders and privacy integrations.

Abstract

Federated learning (FL) allows remote clients to train a global model collaboratively while protecting client privacy. Despite its privacy-preserving benefits, FL has significant drawbacks, including slow convergence, high communication cost, and non-independent-and-identically-distributed (non-IID) data. In this work, we present a novel FedAvg variation called Fractional-Order Federated Averaging (FOFedAvg), which incorporates Fractional-Order Stochastic Gradient Descent (FOSGD) to capture long-range relationships and deeper historical information. By introducing memory-aware fractional-order updates, FOFedAvg improves communication efficiency and accelerates convergence while mitigating instability caused by heterogeneous, non-IID client data. We compare FOFedAvg against a broad set of established federated optimization algorithms on benchmark datasets including MNIST, FEMNIST, CIFAR-10, CIFAR-100, EMNIST, the Cleveland heart disease dataset, Sent140, PneumoniaMNIST, and Edge-IIoTset. Across a range of non-IID partitioning schemes, FOFedAvg is competitive with, and often outperforms, these baselines in terms of test performance and convergence speed. On the theoretical side, we prove that FOFedAvg converges to a stationary point under standard smoothness and bounded-variance assumptions for fractional order . Together, these results show that fractional-order, memory-aware updates can substantially improve the robustness and effectiveness of federated learning, offering a practical path toward distributed training on heterogeneous data.
Paper Structure (43 sections, 3 theorems, 109 equations, 22 figures, 6 tables, 2 algorithms)

This paper contains 43 sections, 3 theorems, 109 equations, 22 figures, 6 tables, 2 algorithms.

Key Result

Theorem 1

Let $f:\mathbb{R}^d \to \mathbb{R}$ be an $L$-smooth (potentially non-convex) function with a lower bound $f_{\inf}$. Suppose $0 < \alpha \le 1$ (fractional order), and consider the sequence $\{\Theta_t\}$ generated by where Here, $\bar{\alpha} > 0$ is an upper bound ensuring $\alpha_t \le \bar{\alpha} \le \tfrac{2}{L}$ for all $t$. Then, In particular, if the sequence $\{\Theta_t\}$ is bounded

Figures (22)

  • Figure 1: Comparison of federated algorithms on the MNIST dataset under a non-IID setting with 10 clients.
  • Figure 2: Comparison of federated algorithms on the CIFAR-10 dataset under a non-IID setting.
  • Figure 3: Comparison of federated learning algorithms on the EMNIST dataset.
  • Figure 4: Comparison of federated learning algorithms on the Cleveland heart disease dataset.
  • Figure 5: Test accuracy of federated learning algorithms on the Sent140 dataset.
  • ...and 17 more figures

Theorems & Definitions (12)

  • Definition 1: Gamma Function
  • Definition 2: Grünwald--Letnikov Derivative
  • Definition 3: Caputo Derivative
  • Theorem 1: Convergence to Stationary Points
  • proof
  • Remark 1: Role of $\delta>0$
  • Proposition 3.1
  • proof
  • Proposition 3.2
  • proof
  • ...and 2 more