Table of Contents
Fetching ...

Enhanced Federated Optimization: Adaptive Unbiased Client Sampling with Reduced Variance

Dun Zeng, Zenglin Xu, Yu Pan, Xu Luo, Qifan Wang, Xiaoying Tang

TL;DR

This work presents the first adaptive client sampler, K-Vib, employing an independent sampling procedure, and indicates that K-Vib doubles the speed compared to baseline algorithms, demonstrating significant potential in federated optimization.

Abstract

Federated Learning (FL) is a distributed learning paradigm to train a global model across multiple devices without collecting local data. In FL, a server typically selects a subset of clients for each training round to optimize resource usage. Central to this process is the technique of unbiased client sampling, which ensures a representative selection of clients. Current methods primarily utilize a random sampling procedure which, despite its effectiveness, achieves suboptimal efficiency owing to the loose upper bound caused by the sampling variance. In this work, by adopting an independent sampling procedure, we propose a federated optimization framework focused on adaptive unbiased client sampling, improving the convergence rate via an online variance reduction strategy. In particular, we present the first adaptive client sampler, K-Vib, employing an independent sampling procedure. K-Vib achieves a linear speed-up on the regret bound $\tilde{\mathcal{O}}\big(N^{\frac{1}{3}}T^{\frac{2}{3}}/K^{\frac{4}{3}}\big)$ within a set communication budget $K$. Empirical studies indicate that K-Vib doubles the speed compared to baseline algorithms, demonstrating significant potential in federated optimization.

Enhanced Federated Optimization: Adaptive Unbiased Client Sampling with Reduced Variance

TL;DR

This work presents the first adaptive client sampler, K-Vib, employing an independent sampling procedure, and indicates that K-Vib doubles the speed compared to baseline algorithms, demonstrating significant potential in federated optimization.

Abstract

Federated Learning (FL) is a distributed learning paradigm to train a global model across multiple devices without collecting local data. In FL, a server typically selects a subset of clients for each training round to optimize resource usage. Central to this process is the technique of unbiased client sampling, which ensures a representative selection of clients. Current methods primarily utilize a random sampling procedure which, despite its effectiveness, achieves suboptimal efficiency owing to the loose upper bound caused by the sampling variance. In this work, by adopting an independent sampling procedure, we propose a federated optimization framework focused on adaptive unbiased client sampling, improving the convergence rate via an online variance reduction strategy. In particular, we present the first adaptive client sampler, K-Vib, employing an independent sampling procedure. K-Vib achieves a linear speed-up on the regret bound within a set communication budget . Empirical studies indicate that K-Vib doubles the speed compared to baseline algorithms, demonstrating significant potential in federated optimization.
Paper Structure (44 sections, 19 theorems, 100 equations, 7 figures, 2 algorithms)

This paper contains 44 sections, 19 theorems, 100 equations, 7 figures, 2 algorithms.

Key Result

Lemma 2.1

For any communication round $t \in [T]$ in FL, random sampling yielding the $\mathbf{P}_{ij}^t = \text{Prob}(i,j\in S^t) = K(K-1)/N(N-1)$, and independent sampling yielding $\mathbf{P}_{ij}^t = \text{Prob}(i,j\in S^t) = \boldsymbol{p}_i^t \boldsymbol{p}_j^t$, they admit

Figures (7)

  • Figure 1: The variance of ISP estimates is lower than RSP. Global estimates on the X-Y plane. (a) Scatter plot of estimates errors, where "uniform" indicates the RSP with uniform probability. (b) The notations RSP($\boldsymbol{g}_i, \boldsymbol{g}_j$) and ISP($\boldsymbol{g}_i,\boldsymbol{g}_j$) represent the global estimates constructed through random sampling and independent sampling, respectively, using sampled vectors $\boldsymbol{g}_i$ and $\boldsymbol{g}_j$. Global indicates the full participation results. We can see ISP($\boldsymbol{g}_i,\boldsymbol{g}_j$) is closer to Global.
  • Figure 2: Evaluation on dynamic regret equation \ref{['eq:regret']}, gradient variance equation \ref{['eq:variance']}, and test loss.
  • Figure 3: Data distribution of synthetic dataset and sensitivity study on $\gamma$.
  • Figure 4: Federated EMNIST dataset experiments.
  • Figure 5: Federated text dataset experiments.
  • ...and 2 more figures

Theorems & Definitions (34)

  • Remark 2.1: Constraints on sampling probability
  • Definition 2.1: Unbiasedness of client sampling $S^t$
  • Lemma 2.1: Optimal sampling procedure, horvath2019nonconvex
  • Lemma 2.2: Optimal sampling probability, chen2020optimal
  • Example 3.1
  • Example 3.2
  • Definition 4.1: Sampling quality
  • Theorem 4.1: FedAvg with arbitrary unbiased client sampling
  • Theorem 5.1: Bound of best fixed probability
  • Lemma 5.1: Solution to equation \ref{['obj:ol_ftrl']}
  • ...and 24 more