Table of Contents
Fetching ...

Approximate Gradient Coding for Privacy-Flexible Federated Learning with Non-IID Data

Okko Makkonen, Sampo Niemelä, Camilla Hollanti, Serge Kas Hanna

TL;DR

The paper tackles federated learning under non-IID data and client stragglers by introducing a privacy-flexible paradigm in which a configurable portion of each client’s data is designated non-private. It combines a one-time offline randomized data sharing phase to reduce label heterogeneity with an approximate gradient coding scheme to tolerate stragglers, yielding an unbiased gradient estimator and a provable variance reduction. Theoretical results quantify how the expected heterogeneity diminishes by a factor related to the replication parameter $d$ and privacy level $c$, and how gradient variance decreases with straggler probability $p$, data replication, and privacy. Empirical validation on MNIST demonstrates faster convergence and robustness to non-IID distributions when appropriately choosing $(c,d)$, illustrating a practical privacy-utility trade-off with a manageable offline cost.

Abstract

This work focuses on the challenges of non-IID data and stragglers/dropouts in federated learning. We introduce and explore a privacy-flexible paradigm that models parts of the clients' local data as non-private, offering a more versatile and business-oriented perspective on privacy. Within this framework, we propose a data-driven strategy for mitigating the effects of label heterogeneity and client straggling on federated learning. Our solution combines both offline data sharing and approximate gradient coding techniques. Through numerical simulations using the MNIST dataset, we demonstrate that our approach enables achieving a deliberate trade-off between privacy and utility, leading to improved model convergence and accuracy while using an adaptable portion of non-private data.

Approximate Gradient Coding for Privacy-Flexible Federated Learning with Non-IID Data

TL;DR

The paper tackles federated learning under non-IID data and client stragglers by introducing a privacy-flexible paradigm in which a configurable portion of each client’s data is designated non-private. It combines a one-time offline randomized data sharing phase to reduce label heterogeneity with an approximate gradient coding scheme to tolerate stragglers, yielding an unbiased gradient estimator and a provable variance reduction. Theoretical results quantify how the expected heterogeneity diminishes by a factor related to the replication parameter and privacy level , and how gradient variance decreases with straggler probability , data replication, and privacy. Empirical validation on MNIST demonstrates faster convergence and robustness to non-IID distributions when appropriately choosing , illustrating a practical privacy-utility trade-off with a manageable offline cost.

Abstract

This work focuses on the challenges of non-IID data and stragglers/dropouts in federated learning. We introduce and explore a privacy-flexible paradigm that models parts of the clients' local data as non-private, offering a more versatile and business-oriented perspective on privacy. Within this framework, we propose a data-driven strategy for mitigating the effects of label heterogeneity and client straggling on federated learning. Our solution combines both offline data sharing and approximate gradient coding techniques. Through numerical simulations using the MNIST dataset, we demonstrate that our approach enables achieving a deliberate trade-off between privacy and utility, leading to improved model convergence and accuracy while using an adaptable portion of non-private data.
Paper Structure (13 sections, 3 theorems, 26 equations, 3 figures)

This paper contains 13 sections, 3 theorems, 26 equations, 3 figures.

Key Result

Theorem 1

For any realization $\boldsymbol{X}\in \Delta^{N-1}$ of $\boldsymbol{\mathsf{X}}$, the randomized data sharing generates label proportions satisfying where the expectation is over the randomness of the data sharing scheme. Furthermore, for $K\gg 1$, we have

Figures (3)

  • Figure 1: Average testing accuracy and second moment of the gradient estimator (defined in \ref{['est']}) as a function of the communication round (iteration) in the Dirichlet setting with $\alpha = 0.1$.
  • Figure 2: Average testing accuracy and second moment of the gradient estimator (defined in \ref{['est']}) as a function of the communication round (iteration) in the single-class setting.
  • Figure 3: The same simulation setup, as explained in Section \ref{['simul_setup']} based on the MNIST dataset, is considered under the single-class label-heterogeneous setting. The figure shows the sums of the scalar products of gradients computed on examples from the same and different classes, referred to as Type I and Type II scalar products, respectively. These sums are reported at each iteration $t$ of the algorithm. Additionally, the squared Euclidean norm of the full gradient $\mathbf{g}^{(t)}= \sum_{j=1}^{M} \mathbf{g}^{(t)}_j$, obtained by summing both Type I and II scalar products, is also given. A fixed learning rate of $\eta=0.01$ is used.

Theorems & Definitions (4)

  • Theorem 1
  • Theorem 2
  • Proposition 1
  • Remark 1