Fractional-Order Federated Learning

Mohammad Partohaghighi; Roummel Marcia; YangQuan Chen

Fractional-Order Federated Learning

Mohammad Partohaghighi, Roummel Marcia, YangQuan Chen

TL;DR

FOFedAvg addresses federated learning challenges under non-IID data by injecting memory into local updates through a Caputo-type fractional derivative of order $0<\alpha\le 1$, yielding memory-aware, long-range gradient updates. The approach preserves FedAvg’s communication pattern while achieving faster convergence and greater stability, supported by a convergence analysis that shows accumulation at stationary points under standard $L$-smoothness and bounded-variance assumptions. Empirical results across nine diverse datasets demonstrate competitive or superior performance relative to strong FL baselines, with notable communication-efficiency gains in non-IID regimes. The work highlights memory-based optimization as a practical path for robust distributed learning in heterogeneous environments, and outlines future directions for adaptive fractional orders and privacy integrations.

Abstract

Federated learning (FL) allows remote clients to train a global model collaboratively while protecting client privacy. Despite its privacy-preserving benefits, FL has significant drawbacks, including slow convergence, high communication cost, and non-independent-and-identically-distributed (non-IID) data. In this work, we present a novel FedAvg variation called Fractional-Order Federated Averaging (FOFedAvg), which incorporates Fractional-Order Stochastic Gradient Descent (FOSGD) to capture long-range relationships and deeper historical information. By introducing memory-aware fractional-order updates, FOFedAvg improves communication efficiency and accelerates convergence while mitigating instability caused by heterogeneous, non-IID client data. We compare FOFedAvg against a broad set of established federated optimization algorithms on benchmark datasets including MNIST, FEMNIST, CIFAR-10, CIFAR-100, EMNIST, the Cleveland heart disease dataset, Sent140, PneumoniaMNIST, and Edge-IIoTset. Across a range of non-IID partitioning schemes, FOFedAvg is competitive with, and often outperforms, these baselines in terms of test performance and convergence speed. On the theoretical side, we prove that FOFedAvg converges to a stationary point under standard smoothness and bounded-variance assumptions for fractional order $0<α\le 1$. Together, these results show that fractional-order, memory-aware updates can substantially improve the robustness and effectiveness of federated learning, offering a practical path toward distributed training on heterogeneous data.

Fractional-Order Federated Learning

TL;DR

FOFedAvg addresses federated learning challenges under non-IID data by injecting memory into local updates through a Caputo-type fractional derivative of order

, yielding memory-aware, long-range gradient updates. The approach preserves FedAvg’s communication pattern while achieving faster convergence and greater stability, supported by a convergence analysis that shows accumulation at stationary points under standard

-smoothness and bounded-variance assumptions. Empirical results across nine diverse datasets demonstrate competitive or superior performance relative to strong FL baselines, with notable communication-efficiency gains in non-IID regimes. The work highlights memory-based optimization as a practical path for robust distributed learning in heterogeneous environments, and outlines future directions for adaptive fractional orders and privacy integrations.

Abstract

. Together, these results show that fractional-order, memory-aware updates can substantially improve the robustness and effectiveness of federated learning, offering a practical path toward distributed training on heterogeneous data.

Paper Structure (43 sections, 3 theorems, 109 equations, 22 figures, 6 tables, 2 algorithms)

This paper contains 43 sections, 3 theorems, 109 equations, 22 figures, 6 tables, 2 algorithms.

Introduction
Related Work
Federated Learning Algorithms
Fractional Calculus in Optimization
Positioning FOFedAvg’s Contribution
Fractional Calculus in Federated Learning
Fractional Order Stochastic Gradient Descent (FOSGD)
Computational viewpoint.
The Importance of Fractional Calculus in Federated Learning
Fractional Order Federated Averaging Algorithm
Fractional-Order Extension
Convergence analysis
Experiments
Conclusion and Future Work
Appendix
...and 28 more sections

Key Result

Theorem 1

Let $f:\mathbb{R}^d \to \mathbb{R}$ be an $L$-smooth (potentially non-convex) function with a lower bound $f_{\inf}$. Suppose $0 < \alpha \le 1$ (fractional order), and consider the sequence $\{\Theta_t\}$ generated by where Here, $\bar{\alpha} > 0$ is an upper bound ensuring $\alpha_t \le \bar{\alpha} \le \tfrac{2}{L}$ for all $t$. Then, In particular, if the sequence $\{\Theta_t\}$ is bounded

Figures (22)

Figure 1: Comparison of federated algorithms on the MNIST dataset under a non-IID setting with 10 clients.
Figure 2: Comparison of federated algorithms on the CIFAR-10 dataset under a non-IID setting.
Figure 3: Comparison of federated learning algorithms on the EMNIST dataset.
Figure 4: Comparison of federated learning algorithms on the Cleveland heart disease dataset.
Figure 5: Test accuracy of federated learning algorithms on the Sent140 dataset.
...and 17 more figures

Theorems & Definitions (12)

Definition 1: Gamma Function
Definition 2: Grünwald--Letnikov Derivative
Definition 3: Caputo Derivative
Theorem 1: Convergence to Stationary Points
proof
Remark 1: Role of $\delta>0$
Proposition 3.1
proof
Proposition 3.2
proof
...and 2 more

Fractional-Order Federated Learning

TL;DR

Abstract

Fractional-Order Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (22)

Theorems & Definitions (12)