ISFL: Federated Learning for Non-i.i.d. Data with Local Importance Sampling

Zheqi Zhu; Yuchen Shi; Pingyi Fan; Chenghui Peng; Khaled B. Letaief

ISFL: Federated Learning for Non-i.i.d. Data with Local Importance Sampling

Zheqi Zhu, Yuchen Shi, Pingyi Fan, Chenghui Peng, Khaled B. Letaief

TL;DR

ISFL tackles the gradient-diversity challenge of non-i.i.d. data in federated learning by introducing local importance sampling that reweights client-side training samples to better align with the global objective. The authors derive a convergence bound that explicitly accounts for local IS and formulate per-client optimal IS strategies via a water-filling optimization, enabling practical computation of IS weights. They design ISFL algorithms that update IS weights synchronously with federated rounds and validate the approach on CIFAR-10/100, showing improved accuracy, faster convergence, and better sampling efficiency compared to non-IS baselines. The work provides theoretical guarantees for neural networks and demonstrates that local IS can be a drop-in enhancement for diverse FL frameworks, improving robustness to label-skewed non-i.i.d. data and enabling more data-efficient training.

Abstract

As a promising learning paradigm integrating computation and communication, federated learning (FL) proceeds the local training and the periodic sharing from distributed clients. Due to the non-i.i.d. data distribution on clients, FL model suffers from the gradient diversity, poor performance, bad convergence, etc. In this work, we aim to tackle this key issue by adopting importance sampling (IS) for local training. We propose importance sampling federated learning (ISFL), an explicit framework with theoretical guarantees. Firstly, we derive the convergence theorem of ISFL to involve the effects of local importance sampling. Then, we formulate the problem of selecting optimal IS weights and obtain the theoretical solutions. We also employ a water-filling method to calculate the IS weights and develop the ISFL algorithms. The experimental results on CIFAR-10 fit the proposed theorems well and verify that ISFL reaps better performance, sampling efficiency, as well as explainability on non-i.i.d. data. To the best of our knowledge, ISFL is the first non-i.i.d. FL solution from the local sampling aspect which exhibits theoretical compatibility with neural network models. Furthermore, as a local sampling approach, ISFL can be easily migrated into other emerging FL frameworks.

ISFL: Federated Learning for Non-i.i.d. Data with Local Importance Sampling

TL;DR

Abstract

Paper Structure (31 sections, 4 theorems, 36 equations, 9 figures, 4 tables, 2 algorithms)

This paper contains 31 sections, 4 theorems, 36 equations, 9 figures, 4 tables, 2 algorithms.

Introduction
Background
Motivation
Related Work
Contributions & Paper Organization
Preliminaries and ISFL Framework
Federated Learning for Non-i.i.d. Data
Importance Sampling
ISFL Framework
Theoretical Results
Convergence Analysis with Local Importance Sampling
Optimal Importance Sampling Strategies
Algorithm Designs
ISFL Algorithms
Practical Issues
...and 16 more sections

Key Result

Theorem 1

Consider the given $T$-step range $\mathcal{T}$ from $T_0$ to $T_1$. By setting proper $\eta$, the expectation gradient norm of ISFL with IS probabilities $\{q^k_i\}$ can be upper bounded by: where Therein, $\bar{L}_\mathcal{T}=\sum_i p_i L_{i}(\mathcal{T})$ is the average gradient Lipschitz of the global model, and $t_c=\lfloor\frac{t}{E_l}\rfloor\cdot E_l$ is the latest epoch of model aggregat

Figures (9)

Figure 1: A sketch for the impact of non-i.i.d. data (w/ and w/o IS): FedAvg with $K=2$ and $E_l=2$. The arrows represent the model evolution.
Figure 2: The workflow of ISFL framework.
Figure 3: The water-filling sketch of the optimal IS weights.
Figure 4: Comparisons of several FL schemes on CIFAR-10.
Figure 5: Comparisons with non-IS based FL schemes for non-iid settings.
...and 4 more figures

Theorems & Definitions (12)

Theorem 1: Convergence analysis
Remark 1: Interpretation of the bound
Remark 2: The differences with general convergence bounds of non-i.i.d. FL
Theorem 2: Optimal IS Probabilities
Remark 3
Remark 4: The comparison with IS based FL
Theorem 3: Calculation of $\Gamma^*$
Lemma 1
proof
proof : Proof of Theorem \ref{['thm1-bound']}
...and 2 more

ISFL: Federated Learning for Non-i.i.d. Data with Local Importance Sampling

TL;DR

Abstract

ISFL: Federated Learning for Non-i.i.d. Data with Local Importance Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (12)