Adaptive Federated Learning via New Entropy Approach

Shensheng Zheng; Wenhao Yuan; Xuehe Wang; Lingjie Duan

Adaptive Federated Learning via New Entropy Approach

Shensheng Zheng, Wenhao Yuan, Xuehe Wang, Lingjie Duan

TL;DR

FedEnt addresses Non-IID challenges in federated learning by introducing an entropy-based, decentralized adaptive learning-rate strategy. It derives a closed-form per-client learning rate through a discrete-time Hamiltonian framework and uses mean-field estimators to approximate other clients’ influence without inter-client communication, with fixed-point iterations ensuring estimator existence. Theoretical results bound client drifting and global loss, yielding a convergence rate that improves upon FedAvg. Empirical results on MNIST, EMNIST-L, CIFAR10, and CIFAR100 under Dirichlet-induced heterogeneity show that FedEnt achieves faster convergence and higher accuracy than FedAvg, FedAdam, FedProx, and FedDyn, confirming its practical impact for robust, scalable FL.

Abstract

Federated Learning (FL) has emerged as a prominent distributed machine learning framework that enables geographically discrete clients to train a global model collaboratively while preserving their privacy-sensitive data. However, due to the non-independent-and-identically-distributed (Non-IID) data generated by heterogeneous clients, the performances of the conventional federated optimization schemes such as FedAvg and its variants deteriorate, requiring the design to adaptively adjust specific model parameters to alleviate the negative influence of heterogeneity. In this paper, by leveraging entropy as a new metric for assessing the degree of system disorder, we propose an adaptive FEDerated learning algorithm based on ENTropy theory (FedEnt) to alleviate the parameter deviation among heterogeneous clients and achieve fast convergence. Nevertheless, given the data disparity and parameter deviation of heterogeneous clients, determining the optimal dynamic learning rate for each client becomes a challenging task as there is no communication among participating clients during the local training epochs. To enable a decentralized learning rate for each participating client, we first introduce the mean-field terms to estimate the components associated with other clients' local parameters. Furthermore, we provide rigorous theoretical analysis on the existence and determination of the mean-field estimators. Based on the mean-field estimators, the closed-form adaptive learning rate for each client is derived by constructing the Hamilton equation. Moreover, the convergence rate of our proposed FedEnt is proved. The extensive experimental results on the real-world datasets (i.e., MNIST, EMNIST-L, CIFAR10, and CIFAR100) show that our FedEnt algorithm surpasses FedAvg and its variants (i.e., FedAdam, FedProx, and FedDyn) under Non-IID settings and achieves a faster convergence rate.

Adaptive Federated Learning via New Entropy Approach

TL;DR

Abstract

Paper Structure (28 sections, 7 theorems, 49 equations, 13 figures, 4 tables, 2 algorithms)

This paper contains 28 sections, 7 theorems, 49 equations, 13 figures, 4 tables, 2 algorithms.

Introduction
Related Work
Partial Optimization in FL
Asynchronous Optimization in FL
Adaptive Optimization in FL
System Model and Problem Formulation
Standard Federated Learning Model
Problem Formulation for New Entropy-based adaptive Federated Learning
Analysis of Adaptive Learning Rate
Framework Overview of FedEnt
Adaptive Learning Rate for Each Client
Update of Mean-Field Estimators for Finalizing the Adaptive Learning Rate
Convergence Analysis for FedEnt
Bounding Client Drifting
Bounding Global Loss Function
...and 13 more sections

Key Result

Lemma 1

[Polyak-Łojasiewicz inequality] Denote the optimal global parameter as $\boldsymbol{w}^{*}$, the $L$-Lipschitz continuous global loss function $F(\boldsymbol{w})$ satisfies Polyak-Łojasiewicz condition, i.e., the following holds for any $\delta > 0$ and $\boldsymbol{w} \in \mathbb{R}^{d}$:

Figures (13)

Figure 1: The Federated Learning Framework with Entropy Approach.
Figure 2: The different data distributions on the CIFAR10 dataset.
Figure 3: The performance of different FL methods on the MNIST dataset with 20% of 100 clients.
Figure 4: The performance of different FL methods on the CIFAR10 dataset with 20% of 100 clients.
Figure 5: The performance of different FL methods on the EMNIST-L dataset with 20% of 100 clients.
...and 8 more figures

Theorems & Definitions (10)

Lemma 1
Remark 1
Definition 1
Definition 2
Proposition 1
Proposition 2
Proposition 3
Proposition 4
Theorem 1
Theorem 2

Adaptive Federated Learning via New Entropy Approach

TL;DR

Abstract

Adaptive Federated Learning via New Entropy Approach

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (10)