Table of Contents
Fetching ...

ISFL: Federated Learning for Non-i.i.d. Data with Local Importance Sampling

Zheqi Zhu, Yuchen Shi, Pingyi Fan, Chenghui Peng, Khaled B. Letaief

TL;DR

ISFL tackles the gradient-diversity challenge of non-i.i.d. data in federated learning by introducing local importance sampling that reweights client-side training samples to better align with the global objective. The authors derive a convergence bound that explicitly accounts for local IS and formulate per-client optimal IS strategies via a water-filling optimization, enabling practical computation of IS weights. They design ISFL algorithms that update IS weights synchronously with federated rounds and validate the approach on CIFAR-10/100, showing improved accuracy, faster convergence, and better sampling efficiency compared to non-IS baselines. The work provides theoretical guarantees for neural networks and demonstrates that local IS can be a drop-in enhancement for diverse FL frameworks, improving robustness to label-skewed non-i.i.d. data and enabling more data-efficient training.

Abstract

As a promising learning paradigm integrating computation and communication, federated learning (FL) proceeds the local training and the periodic sharing from distributed clients. Due to the non-i.i.d. data distribution on clients, FL model suffers from the gradient diversity, poor performance, bad convergence, etc. In this work, we aim to tackle this key issue by adopting importance sampling (IS) for local training. We propose importance sampling federated learning (ISFL), an explicit framework with theoretical guarantees. Firstly, we derive the convergence theorem of ISFL to involve the effects of local importance sampling. Then, we formulate the problem of selecting optimal IS weights and obtain the theoretical solutions. We also employ a water-filling method to calculate the IS weights and develop the ISFL algorithms. The experimental results on CIFAR-10 fit the proposed theorems well and verify that ISFL reaps better performance, sampling efficiency, as well as explainability on non-i.i.d. data. To the best of our knowledge, ISFL is the first non-i.i.d. FL solution from the local sampling aspect which exhibits theoretical compatibility with neural network models. Furthermore, as a local sampling approach, ISFL can be easily migrated into other emerging FL frameworks.

ISFL: Federated Learning for Non-i.i.d. Data with Local Importance Sampling

TL;DR

ISFL tackles the gradient-diversity challenge of non-i.i.d. data in federated learning by introducing local importance sampling that reweights client-side training samples to better align with the global objective. The authors derive a convergence bound that explicitly accounts for local IS and formulate per-client optimal IS strategies via a water-filling optimization, enabling practical computation of IS weights. They design ISFL algorithms that update IS weights synchronously with federated rounds and validate the approach on CIFAR-10/100, showing improved accuracy, faster convergence, and better sampling efficiency compared to non-IS baselines. The work provides theoretical guarantees for neural networks and demonstrates that local IS can be a drop-in enhancement for diverse FL frameworks, improving robustness to label-skewed non-i.i.d. data and enabling more data-efficient training.

Abstract

As a promising learning paradigm integrating computation and communication, federated learning (FL) proceeds the local training and the periodic sharing from distributed clients. Due to the non-i.i.d. data distribution on clients, FL model suffers from the gradient diversity, poor performance, bad convergence, etc. In this work, we aim to tackle this key issue by adopting importance sampling (IS) for local training. We propose importance sampling federated learning (ISFL), an explicit framework with theoretical guarantees. Firstly, we derive the convergence theorem of ISFL to involve the effects of local importance sampling. Then, we formulate the problem of selecting optimal IS weights and obtain the theoretical solutions. We also employ a water-filling method to calculate the IS weights and develop the ISFL algorithms. The experimental results on CIFAR-10 fit the proposed theorems well and verify that ISFL reaps better performance, sampling efficiency, as well as explainability on non-i.i.d. data. To the best of our knowledge, ISFL is the first non-i.i.d. FL solution from the local sampling aspect which exhibits theoretical compatibility with neural network models. Furthermore, as a local sampling approach, ISFL can be easily migrated into other emerging FL frameworks.
Paper Structure (31 sections, 4 theorems, 36 equations, 9 figures, 4 tables, 2 algorithms)

This paper contains 31 sections, 4 theorems, 36 equations, 9 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

Consider the given $T$-step range $\mathcal{T}$ from $T_0$ to $T_1$. By setting proper $\eta$, the expectation gradient norm of ISFL with IS probabilities $\{q^k_i\}$ can be upper bounded by: where Therein, $\bar{L}_\mathcal{T}=\sum_i p_i L_{i}(\mathcal{T})$ is the average gradient Lipschitz of the global model, and $t_c=\lfloor\frac{t}{E_l}\rfloor\cdot E_l$ is the latest epoch of model aggregat

Figures (9)

  • Figure 1: A sketch for the impact of non-i.i.d. data (w/ and w/o IS): FedAvg with $K=2$ and $E_l=2$. The arrows represent the model evolution.
  • Figure 2: The workflow of ISFL framework.
  • Figure 3: The water-filling sketch of the optimal IS weights.
  • Figure 4: Comparisons of several FL schemes on CIFAR-10.
  • Figure 5: Comparisons with non-IS based FL schemes for non-iid settings.
  • ...and 4 more figures

Theorems & Definitions (12)

  • Theorem 1: Convergence analysis
  • Remark 1: Interpretation of the bound
  • Remark 2: The differences with general convergence bounds of non-i.i.d. FL
  • Theorem 2: Optimal IS Probabilities
  • Remark 3
  • Remark 4: The comparison with IS based FL
  • Theorem 3: Calculation of $\Gamma^*$
  • Lemma 1
  • proof
  • proof : Proof of Theorem \ref{['thm1-bound']}
  • ...and 2 more