Table of Contents
Fetching ...

SCAFFLSA: Taming Heterogeneity in Federated Linear Stochastic Approximation and TD Learning

Paul Mangold, Sergey Samsonov, Safwan Labbi, Ilya Levin, Reda Alami, Alexey Naumov, Eric Moulines

TL;DR

The paper analyzes how heterogeneity affects FedLSA and introduces SCAFFLSA, a bias-corrected variant using control variates to reduce communication. It provides a refined stochastic expansion that separates transient dynamics, heterogeneity bias, and fluctuations, and proves that SCAFFLSA can attain logarithmic communication complexity while preserving linear speed-up in sample complexity. The authors extend the framework to both i.i.d. and Markovian observation models and instantiate the results for federated TD learning with linear function approximation. Empirical results on Garnet environments corroborate the theory, showing that SCAFFLSA eliminates bias in heterogeneous settings and achieves faster convergence with fewer communications.

Abstract

In this paper, we analyze the sample and communication complexity of the federated linear stochastic approximation (FedLSA) algorithm. We explicitly quantify the effects of local training with agent heterogeneity. We show that the communication complexity of FedLSA scales polynomially with the inverse of the desired accuracy $ε$. To overcome this, we propose SCAFFLSA a new variant of FedLSA that uses control variates to correct for client drift, and establish its sample and communication complexities. We show that for statistically heterogeneous agents, its communication complexity scales logarithmically with the desired accuracy, similar to Scaffnew. An important finding is that, compared to the existing results for Scaffnew, the sample complexity scales with the inverse of the number of agents, a property referred to as linear speed-up. Achieving this linear speed-up requires completely new theoretical arguments. We apply the proposed method to federated temporal difference learning with linear function approximation and analyze the corresponding complexity improvements.

SCAFFLSA: Taming Heterogeneity in Federated Linear Stochastic Approximation and TD Learning

TL;DR

The paper analyzes how heterogeneity affects FedLSA and introduces SCAFFLSA, a bias-corrected variant using control variates to reduce communication. It provides a refined stochastic expansion that separates transient dynamics, heterogeneity bias, and fluctuations, and proves that SCAFFLSA can attain logarithmic communication complexity while preserving linear speed-up in sample complexity. The authors extend the framework to both i.i.d. and Markovian observation models and instantiate the results for federated TD learning with linear function approximation. Empirical results on Garnet environments corroborate the theory, showing that SCAFFLSA eliminates bias in heterogeneous settings and achieves faster convergence with fewer communications.

Abstract

In this paper, we analyze the sample and communication complexity of the federated linear stochastic approximation (FedLSA) algorithm. We explicitly quantify the effects of local training with agent heterogeneity. We show that the communication complexity of FedLSA scales polynomially with the inverse of the desired accuracy . To overcome this, we propose SCAFFLSA a new variant of FedLSA that uses control variates to correct for client drift, and establish its sample and communication complexities. We show that for statistically heterogeneous agents, its communication complexity scales logarithmically with the desired accuracy, similar to Scaffnew. An important finding is that, compared to the existing results for Scaffnew, the sample complexity scales with the inverse of the number of agents, a property referred to as linear speed-up. Achieving this linear speed-up requires completely new theoretical arguments. We apply the proposed method to federated temporal difference learning with linear function approximation and analyze the corresponding complexity improvements.
Paper Structure (31 sections, 38 theorems, 247 equations, 8 figures, 2 tables, 6 algorithms)

This paper contains 31 sections, 38 theorems, 247 equations, 8 figures, 2 tables, 6 algorithms.

Key Result

Theorem 4.1

Assume assum:noise-level-flsa and assum:exp_stability. Then for any step size $\eta \in (0,\eta_{\infty})$ it holds that where the bias $\tilde{\theta}^{\sf (bi,bi)}_{t}$ converges in expectation to $\tilde{\theta}^{\sf (bi,bi)}_{\infty} = (\mathrm{I} - \avgProdDetB{H}{\eta})^{-1} \avgbias{H}$ at a geometric rate, and is uniformly bounded by $\mathbb{E}^{1/2}[\norm{\tilde{\theta}^{\sf (bi,bi)}_{t

Figures (8)

  • Figure 1: MSE as a function of the number of communication rounds for FedLSA and SCAFFLSA applied to federated TD(0) in homogeneous and heterogeneous settings, for different number of agents and number of local steps. Green dashed line is FedLSA's bias, as predicted by \ref{['th:2nd_moment_no_cv']}. For each algorithm, we report the average MSE and variance over $5$ runs.
  • Figure 2: MSE, averaged over $10$ runs, for last iterates of FedLSA (dashed lines) and SCAFFLSA (solid lines) in the stationary regime, as a function of the number of agents, in different federated TD(0) problems. The black dotted line decreases in $1/N$, serving as a visual guide for linear speed-up.
  • Figure 3: MSE as a function of the number of communication rounds for FedLSA and SCAFFLSA applied to federated TD(0) in homogeneous settings with $\eta = 0.1$, for different number of agents ($N=10$ on the first line, $N=100$ on the second line) and different number of local steps. Green dashed line is FedLSA's bias, as predicted by \ref{['th:2nd_moment_no_cv']}. For each algorithm, we report the average MSE and variance over $5$ runs.
  • Figure 4: MSE as a function of the number of communication rounds for FedLSA and SCAFFLSA applied to federated TD(0) in heterogeneous settings with $\eta = 0.1$, for different number of agents ($N=10$ on the first line, $N=100$ on the second line) and different number of local steps. Green dashed line is FedLSA's bias, as predicted by \ref{['th:2nd_moment_no_cv']}. For each algorithm, we report the average MSE and variance over $5$ runs.
  • Figure 5: MSE as a function of the number of communication rounds for FedLSA and SCAFFLSA applied to federated TD(0) in homogeneous settings with $\eta = 0.01$, for different number of agents ($N=10$ on the first line, $N=100$ on the second line) and different number of local steps. Green dashed line is FedLSA's bias, as predicted by \ref{['th:2nd_moment_no_cv']}. For each algorithm, we report the average MSE and variance over $5$ runs.
  • ...and 3 more figures

Theorems & Definitions (67)

  • Claim 3.1
  • Theorem 4.1
  • Corollary 4.2
  • Corollary 4.3
  • Corollary 4.4
  • Corollary 4.5: \ref{['cor:sample_complexity_lsa']} adjusted to the Markov samples
  • Remark 4.6
  • Theorem 5.1
  • Corollary 5.2
  • Corollary 5.3
  • ...and 57 more