SCAFFLSA: Taming Heterogeneity in Federated Linear Stochastic Approximation and TD Learning

Paul Mangold; Sergey Samsonov; Safwan Labbi; Ilya Levin; Reda Alami; Alexey Naumov; Eric Moulines

SCAFFLSA: Taming Heterogeneity in Federated Linear Stochastic Approximation and TD Learning

Paul Mangold, Sergey Samsonov, Safwan Labbi, Ilya Levin, Reda Alami, Alexey Naumov, Eric Moulines

TL;DR

The paper analyzes how heterogeneity affects FedLSA and introduces SCAFFLSA, a bias-corrected variant using control variates to reduce communication. It provides a refined stochastic expansion that separates transient dynamics, heterogeneity bias, and fluctuations, and proves that SCAFFLSA can attain logarithmic communication complexity while preserving linear speed-up in sample complexity. The authors extend the framework to both i.i.d. and Markovian observation models and instantiate the results for federated TD learning with linear function approximation. Empirical results on Garnet environments corroborate the theory, showing that SCAFFLSA eliminates bias in heterogeneous settings and achieves faster convergence with fewer communications.

Abstract

In this paper, we analyze the sample and communication complexity of the federated linear stochastic approximation (FedLSA) algorithm. We explicitly quantify the effects of local training with agent heterogeneity. We show that the communication complexity of FedLSA scales polynomially with the inverse of the desired accuracy $ε$. To overcome this, we propose SCAFFLSA a new variant of FedLSA that uses control variates to correct for client drift, and establish its sample and communication complexities. We show that for statistically heterogeneous agents, its communication complexity scales logarithmically with the desired accuracy, similar to Scaffnew. An important finding is that, compared to the existing results for Scaffnew, the sample complexity scales with the inverse of the number of agents, a property referred to as linear speed-up. Achieving this linear speed-up requires completely new theoretical arguments. We apply the proposed method to federated temporal difference learning with linear function approximation and analyze the corresponding complexity improvements.

SCAFFLSA: Taming Heterogeneity in Federated Linear Stochastic Approximation and TD Learning

TL;DR

Abstract

. To overcome this, we propose SCAFFLSA a new variant of FedLSA that uses control variates to correct for client drift, and establish its sample and communication complexities. We show that for statistically heterogeneous agents, its communication complexity scales logarithmically with the desired accuracy, similar to Scaffnew. An important finding is that, compared to the existing results for Scaffnew, the sample complexity scales with the inverse of the number of agents, a property referred to as linear speed-up. Achieving this linear speed-up requires completely new theoretical arguments. We apply the proposed method to federated temporal difference learning with linear function approximation and analyze the corresponding complexity improvements.

Paper Structure (31 sections, 38 theorems, 247 equations, 8 figures, 2 tables, 6 algorithms)

This paper contains 31 sections, 38 theorems, 247 equations, 8 figures, 2 tables, 6 algorithms.

Introduction
Related Work
Federated Linear Stochastic Approximation and TD learning
Federated Linear Stochastic Approximation
Federated Temporal Difference Learning
Refined Analysis of the FedLSA Algorithm
Stochastic expansion for FedLSA
Convergence rate of FedLSA for i.i.d. observation model
Convergence of FedLSA under Markovian observations model
SCAFFLSA: Federated LSA with Bias Correction
Stochastic Controlled Averaging for Federated LSA
Application to Federated TD(0)
Numerical Experiments
Conclusion
Analysis of Federated Linear Stochastic Approximation
...and 16 more sections

Key Result

Theorem 4.1

Assume assum:noise-level-flsa and assum:exp_stability. Then for any step size $\eta \in (0,\eta_{\infty})$ it holds that where the bias $\tilde{\theta}^{\sf (bi,bi)}_{t}$ converges in expectation to $\tilde{\theta}^{\sf (bi,bi)}_{\infty} = (\mathrm{I} - \avgProdDetB{H}{\eta})^{-1} \avgbias{H}$ at a geometric rate, and is uniformly bounded by $\mathbb{E}^{1/2}[\norm{\tilde{\theta}^{\sf (bi,bi)}_{t

Figures (8)

Figure 1: MSE as a function of the number of communication rounds for FedLSA and SCAFFLSA applied to federated TD(0) in homogeneous and heterogeneous settings, for different number of agents and number of local steps. Green dashed line is FedLSA's bias, as predicted by \ref{['th:2nd_moment_no_cv']}. For each algorithm, we report the average MSE and variance over $5$ runs.
Figure 2: MSE, averaged over $10$ runs, for last iterates of FedLSA (dashed lines) and SCAFFLSA (solid lines) in the stationary regime, as a function of the number of agents, in different federated TD(0) problems. The black dotted line decreases in $1/N$, serving as a visual guide for linear speed-up.
Figure 3: MSE as a function of the number of communication rounds for FedLSA and SCAFFLSA applied to federated TD(0) in homogeneous settings with $\eta = 0.1$, for different number of agents ($N=10$ on the first line, $N=100$ on the second line) and different number of local steps. Green dashed line is FedLSA's bias, as predicted by \ref{['th:2nd_moment_no_cv']}. For each algorithm, we report the average MSE and variance over $5$ runs.
Figure 4: MSE as a function of the number of communication rounds for FedLSA and SCAFFLSA applied to federated TD(0) in heterogeneous settings with $\eta = 0.1$, for different number of agents ($N=10$ on the first line, $N=100$ on the second line) and different number of local steps. Green dashed line is FedLSA's bias, as predicted by \ref{['th:2nd_moment_no_cv']}. For each algorithm, we report the average MSE and variance over $5$ runs.
Figure 5: MSE as a function of the number of communication rounds for FedLSA and SCAFFLSA applied to federated TD(0) in homogeneous settings with $\eta = 0.01$, for different number of agents ($N=10$ on the first line, $N=100$ on the second line) and different number of local steps. Green dashed line is FedLSA's bias, as predicted by \ref{['th:2nd_moment_no_cv']}. For each algorithm, we report the average MSE and variance over $5$ runs.
...and 3 more figures

Theorems & Definitions (67)

Claim 3.1
Theorem 4.1
Corollary 4.2
Corollary 4.3
Corollary 4.4
Corollary 4.5: \ref{['cor:sample_complexity_lsa']} adjusted to the Markov samples
Remark 4.6
Theorem 5.1
Corollary 5.2
Corollary 5.3
...and 57 more

SCAFFLSA: Taming Heterogeneity in Federated Linear Stochastic Approximation and TD Learning

TL;DR

Abstract

SCAFFLSA: Taming Heterogeneity in Federated Linear Stochastic Approximation and TD Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (67)