Table of Contents
Fetching ...

Convergence Guarantees for Federated SARSA with Local Training and Heterogeneous Agents

Paul Mangold, Eloïse Berthier, Eric Moulines

TL;DR

This work provides the first finite-time convergence guarantees for FedSARSA in a heterogeneous federated reinforcement learning setting with local updates and linear function approximation. It introduces an exact multi-step error expansion for single-agent SARSA and extends it to FedSARSA by analyzing a unique limit point θ_* defined by a federated TD fixed-point equation, showing linear speed-up in the number of agents. The results quantify how transition and reward heterogeneity induce bias, detail the role of Markovian noise, and establish explicit sample/communication complexities. Numerical experiments corroborate the theory, illustrating linear speed-up and the impact of local-update bias in heterogeneous environments.

Abstract

We present a novel theoretical analysis of Federated SARSA (FedSARSA) with linear function approximation and local training. We establish convergence guarantees for FedSARSA in the presence of heterogeneity, both in local transitions and rewards, providing the first sample and communication complexity bounds in this setting. At the core of our analysis is a new, exact multi-step error expansion for single-agent SARSA, which is of independent interest. Our analysis precisely quantifies the impact of heterogeneity, demonstrating the convergence of FedSARSA with multiple local updates. Crucially, we show that FedSARSA achieves linear speed-up with respect to the number of agents, up to higher-order terms due to Markovian sampling. Numerical experiments support our theoretical findings.

Convergence Guarantees for Federated SARSA with Local Training and Heterogeneous Agents

TL;DR

This work provides the first finite-time convergence guarantees for FedSARSA in a heterogeneous federated reinforcement learning setting with local updates and linear function approximation. It introduces an exact multi-step error expansion for single-agent SARSA and extends it to FedSARSA by analyzing a unique limit point θ_* defined by a federated TD fixed-point equation, showing linear speed-up in the number of agents. The results quantify how transition and reward heterogeneity induce bias, detail the role of Markovian noise, and establish explicit sample/communication complexities. Numerical experiments corroborate the theory, illustrating linear speed-up and the impact of local-update bias in heterogeneous environments.

Abstract

We present a novel theoretical analysis of Federated SARSA (FedSARSA) with linear function approximation and local training. We establish convergence guarantees for FedSARSA in the presence of heterogeneity, both in local transitions and rewards, providing the first sample and communication complexity bounds in this setting. At the core of our analysis is a new, exact multi-step error expansion for single-agent SARSA, which is of independent interest. Our analysis precisely quantifies the impact of heterogeneity, demonstrating the convergence of FedSARSA with multiple local updates. Crucially, we show that FedSARSA achieves linear speed-up with respect to the number of agents, up to higher-order terms due to Markovian sampling. Numerical experiments support our theoretical findings.

Paper Structure

This paper contains 35 sections, 40 theorems, 189 equations, 1 figure, 1 table, 1 algorithm.

Key Result

Lemma 1

Assume assum:bounded-A-b--assum:lipschitz-improvement. Let $t \ge 0$, assume that the step size satisfies $\eta_t H \mathrm{C}_A \le 1/6$. Then, it holds that where [$c_1 > 0$ is an absolute constant], $\mathrm{G} \overset{\Delta}{=} \mathrm{C}_A \mathrm{\widetilde{C}}_{\textnormal{proj}} + \mathrm{C}_b$ and $\delta=0$ if episodes start in the stationary distribution and $\delta=1$ otherwise.

Figures (1)

  • Figure 1: Mean squared error as a function of the number of communications. For each run, we report two errors: (i) in solid lines, the error $\mathbb{E}[ \norm{ \globparam{t} - \theta_\star }^2 ]$ in estimating $\theta_\star$ as defined in \ref{['prop:existence-theta-star-fed']}, and (ii) in dashed lines, the error $\mathbb{E}[ \norm{ \globparam{t} - \chi_\star }^2 ]$ in estimating $\chi_\star$, the solution obtained by running SARSA on the averaged environment. For each plot, we report the average over $10$ runs and the corresponding standard deviation.

Theorems & Definitions (74)

  • Claim 1
  • Lemma 1
  • Theorem 1
  • Corollary 1
  • Remark 1
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Claim 2
  • Lemma 2
  • ...and 64 more