Table of Contents
Fetching ...

FedImpro: Measuring and Improving Client Update in Federated Learning

Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xinmei Tian, Tongliang Liu, Bo Han, Xiaowen Chu

TL;DR

FedImpro decouples the model into high-level and low-level components, and trains the high-level portion on reconstructed feature distributions that enhances the generalization contribution and reduces the dissimilarity of gradients in FL.

Abstract

Federated Learning (FL) models often experience client drift caused by heterogeneous data, where the distribution of data differs across clients. To address this issue, advanced research primarily focuses on manipulating the existing gradients to achieve more consistent client models. In this paper, we present an alternative perspective on client drift and aim to mitigate it by generating improved local models. First, we analyze the generalization contribution of local training and conclude that this generalization contribution is bounded by the conditional Wasserstein distance between the data distribution of different clients. Then, we propose FedImpro, to construct similar conditional distributions for local training. Specifically, FedImpro decouples the model into high-level and low-level components, and trains the high-level portion on reconstructed feature distributions. This approach enhances the generalization contribution and reduces the dissimilarity of gradients in FL. Experimental results show that FedImpro can help FL defend against data heterogeneity and enhance the generalization performance of the model.

FedImpro: Measuring and Improving Client Update in Federated Learning

TL;DR

FedImpro decouples the model into high-level and low-level components, and trains the high-level portion on reconstructed feature distributions that enhances the generalization contribution and reduces the dissimilarity of gradients in FL.

Abstract

Federated Learning (FL) models often experience client drift caused by heterogeneous data, where the distribution of data differs across clients. To address this issue, advanced research primarily focuses on manipulating the existing gradients to achieve more consistent client models. In this paper, we present an alternative perspective on client drift and aim to mitigate it by generating improved local models. First, we analyze the generalization contribution of local training and conclude that this generalization contribution is bounded by the conditional Wasserstein distance between the data distribution of different clients. Then, we propose FedImpro, to construct similar conditional distributions for local training. Specifically, FedImpro decouples the model into high-level and low-level components, and trains the high-level portion on reconstructed feature distributions. This approach enhances the generalization contribution and reduces the dissimilarity of gradients in FL. Experimental results show that FedImpro can help FL defend against data heterogeneity and enhance the generalization performance of the model.
Paper Structure (44 sections, 4 theorems, 29 equations, 17 figures, 11 tables, 1 algorithm)

This paper contains 44 sections, 4 theorems, 29 equations, 17 figures, 11 tables, 1 algorithm.

Key Result

Theorem 4.1

With the pseudo gradient $\Delta$ obtained by $\mathbf{L}(\mathcal{D}_m)$, the generalization contribution is lower bounded: where $\tilde{\mathcal{D}}_m$ represents the dataset sampled from $\mathcal{D}_m$.

Figures (17)

  • Figure 1: Training process of our framework. On any $m$-th Client, the low-level model uses the raw data $x_m$ as input, and outputs feature $h_m$. The high-level model uses $h_m$ and samples $\hat{h}$ from a shared distribution $\mathcal{H}^r$ as input for forward and backward propagation. Noises will be added to the locally estimated $\mathcal{H}^{r+1}_m$ before aggregation on the Server to update the global $\mathcal{H}^{r+1}$. Model parameters follow the FedAvg aggregation or other FL aggregation algorithms.
  • Figure 2: CIFAR10 with $a=0.1$, $E=1$, $M=10$.
  • Figure 3: Layer divergence of FedAvg.
  • Figure 4: Convergence comparison of CIFAR-10.
  • Figure 5: Convergence comparison of FMNIST.
  • ...and 12 more figures

Theorems & Definitions (11)

  • Definition 3.1
  • Definition 4.1
  • Theorem 4.1
  • Remark 4.1
  • Definition 4.2
  • Theorem 4.2
  • Remark 4.2
  • Theorem C.1
  • proof
  • Theorem C.2
  • ...and 1 more