FedImpro: Measuring and Improving Client Update in Federated Learning

Zhenheng Tang; Yonggang Zhang; Shaohuai Shi; Xinmei Tian; Tongliang Liu; Bo Han; Xiaowen Chu

FedImpro: Measuring and Improving Client Update in Federated Learning

Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xinmei Tian, Tongliang Liu, Bo Han, Xiaowen Chu

TL;DR

FedImpro decouples the model into high-level and low-level components, and trains the high-level portion on reconstructed feature distributions that enhances the generalization contribution and reduces the dissimilarity of gradients in FL.

Abstract

Federated Learning (FL) models often experience client drift caused by heterogeneous data, where the distribution of data differs across clients. To address this issue, advanced research primarily focuses on manipulating the existing gradients to achieve more consistent client models. In this paper, we present an alternative perspective on client drift and aim to mitigate it by generating improved local models. First, we analyze the generalization contribution of local training and conclude that this generalization contribution is bounded by the conditional Wasserstein distance between the data distribution of different clients. Then, we propose FedImpro, to construct similar conditional distributions for local training. Specifically, FedImpro decouples the model into high-level and low-level components, and trains the high-level portion on reconstructed feature distributions. This approach enhances the generalization contribution and reduces the dissimilarity of gradients in FL. Experimental results show that FedImpro can help FL defend against data heterogeneity and enhance the generalization performance of the model.

FedImpro: Measuring and Improving Client Update in Federated Learning

TL;DR

Abstract

Paper Structure (44 sections, 4 theorems, 29 equations, 17 figures, 11 tables, 1 algorithm)

This paper contains 44 sections, 4 theorems, 29 equations, 17 figures, 11 tables, 1 algorithm.

Introduction
Related Works
Addressing Non-IID problem in FL
Measuring Contribution from Clients
Split Training
Privacy Concerns
Preliminaries
Problem Definition
Generalization Quantification
Decoupled Training Against Data Heterogeneity
Generalization Contribution
Decoupled Gradient Dissimilarity
Training Procedure
Experiments
Experiment Setup
...and 29 more sections

Key Result

Theorem 4.1

With the pseudo gradient $\Delta$ obtained by $\mathbf{L}(\mathcal{D}_m)$, the generalization contribution is lower bounded: where $\tilde{\mathcal{D}}_m$ represents the dataset sampled from $\mathcal{D}_m$.

Figures (17)

Figure 1: Training process of our framework. On any $m$-th Client, the low-level model uses the raw data $x_m$ as input, and outputs feature $h_m$. The high-level model uses $h_m$ and samples $\hat{h}$ from a shared distribution $\mathcal{H}^r$ as input for forward and backward propagation. Noises will be added to the locally estimated $\mathcal{H}^{r+1}_m$ before aggregation on the Server to update the global $\mathcal{H}^{r+1}$. Model parameters follow the FedAvg aggregation or other FL aggregation algorithms.
Figure 2: CIFAR10 with $a=0.1$, $E=1$, $M=10$.
Figure 3: Layer divergence of FedAvg.
Figure 4: Convergence comparison of CIFAR-10.
Figure 5: Convergence comparison of FMNIST.
...and 12 more figures

Theorems & Definitions (11)

Definition 3.1
Definition 4.1
Theorem 4.1
Remark 4.1
Definition 4.2
Theorem 4.2
Remark 4.2
Theorem C.1
proof
Theorem C.2
...and 1 more

FedImpro: Measuring and Improving Client Update in Federated Learning

TL;DR

Abstract

FedImpro: Measuring and Improving Client Update in Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (11)