Table of Contents
Fetching ...

Parameter-Free Federated TD Learning with Markov Noise in Heterogeneous Environments

Ankur Naskar, Gugan Thoppe, Utsav Negi, Vijay Gupta

TL;DR

This work addresses federated reinforcement learning with Markovian data, where optimal TD learning rates previously required problem-parameter dependent stepsizes. It introduces parameter-free two-timescale Federated TD algorithms with Polyak-Ruppert averaging, achieving the optimal $ ilde{O}(1/(NT))$ convergence in both average-reward and discounted settings, and extending to heterogeneous MDPs. The theoretical results hinge on decoupling the slower average-reward estimate from the faster TD parameter updates via PR averaging, yielding linear speedups in the number of agents up to a heterogeneity gap. Empirical experiments on synthetic Markovian tasks validate the theory, showing competitive performance with IID baselines and robustness to heterogeneity. The findings have practical implications for scalable, privacy-preserving FRL in diverse environments, enabling parameter-free guarantees without needing problem-specific quantities.

Abstract

Federated learning (FL) can dramatically speed up reinforcement learning by distributing exploration and training across multiple agents. It can guarantee an optimal convergence rate that scales linearly in the number of agents, i.e., a rate of $\tilde{O}(1/(NT)),$ where $T$ is the iteration index and $N$ is the number of agents. However, when the training samples arise from a Markov chain, existing results on TD learning achieving this rate require the algorithm to depend on unknown problem parameters. We close this gap by proposing a two-timescale Federated Temporal Difference (FTD) learning with Polyak-Ruppert averaging. Our method provably attains the optimal $\tilde{O}(1/NT)$ rate in both average-reward and discounted settings--offering a parameter-free FTD approach for Markovian data. Although our results are novel even in the single-agent setting, they apply to the more realistic and challenging scenario of FL with heterogeneous environments.

Parameter-Free Federated TD Learning with Markov Noise in Heterogeneous Environments

TL;DR

This work addresses federated reinforcement learning with Markovian data, where optimal TD learning rates previously required problem-parameter dependent stepsizes. It introduces parameter-free two-timescale Federated TD algorithms with Polyak-Ruppert averaging, achieving the optimal convergence in both average-reward and discounted settings, and extending to heterogeneous MDPs. The theoretical results hinge on decoupling the slower average-reward estimate from the faster TD parameter updates via PR averaging, yielding linear speedups in the number of agents up to a heterogeneity gap. Empirical experiments on synthetic Markovian tasks validate the theory, showing competitive performance with IID baselines and robustness to heterogeneity. The findings have practical implications for scalable, privacy-preserving FRL in diverse environments, enabling parameter-free guarantees without needing problem-specific quantities.

Abstract

Federated learning (FL) can dramatically speed up reinforcement learning by distributing exploration and training across multiple agents. It can guarantee an optimal convergence rate that scales linearly in the number of agents, i.e., a rate of where is the iteration index and is the number of agents. However, when the training samples arise from a Markov chain, existing results on TD learning achieving this rate require the algorithm to depend on unknown problem parameters. We close this gap by proposing a two-timescale Federated Temporal Difference (FTD) learning with Polyak-Ruppert averaging. Our method provably attains the optimal rate in both average-reward and discounted settings--offering a parameter-free FTD approach for Markovian data. Although our results are novel even in the single-agent setting, they apply to the more realistic and challenging scenario of FL with heterogeneous environments.

Paper Structure

This paper contains 9 sections, 13 theorems, 123 equations, 9 figures, 2 tables, 2 algorithms.

Key Result

Theorem 3.2

Assume a: ergodic---a: feature.matrix hold. Let $(\bar{\theta}_{t}, r_{t})$ be the iterates generated by AvgFedTD(0). Then, $\forall i\in [N]$ and $T > t_{*},$ where the constants $C_{r,\textnormal{quad}}, C_{r, \textnormal{lin}}, C_{\bar{\theta}, \textnormal{quad}}, C_{\bar{\theta}, \textnormal{lin}}, H_r(\varepsilon_p, \varepsilon_r),$ and $H_\theta(\varepsilon_p, \varepsilon_r)$ are as defined

Figures (9)

  • Figure 1: Evaluation of our proposed parameter-free algorithms with prior works. Specifically, for average reward, we compare AvgFedTD(0) (Fig. a) with the federated variant of (Zhang et al., 2021) (Fig. b). Similarly, for exponential discounting, we compare ExpFedTD(0) (Fig. c) to the federated TD method from wang2024federated (Fig. d) for the setting described in Section \ref{['sec:experiments']}. The y-axis of each plot is the mean square difference between the ideal parameters and global parameters, i.e., $\mathbb{E} \|\bar{\theta}_t - \theta_{1}^{*}\|^2_{2},$ while the x-axis is the number of iterations. Clearly, our proposed parameter-free algorithms show comparable performance to the ones in the literature that depend on unknown problem parameters.
  • Figure 2: Comparison of different $\beta$ values across the same number of agents executing Algorithm 1 in a heterogeneous Markovian setting.
  • Figure 3: Comparison for different $\beta$ values with a fixed set of agents executing Algorithm 2 in a heterogeneous Markovian setting.
  • Figure 4: Comparison of simulation results executing Algorithm 1 with different values of $\varepsilon_r$ in a heterogeneous Markovian setting.
  • Figure 5: Comparison of simulation results executing Algorithm 2 with different values of $\varepsilon_r$ in a heterogeneous Markovian setting.
  • ...and 4 more figures

Theorems & Definitions (18)

  • Remark 3.1
  • Theorem 3.2: AvgFedTD(0)
  • Theorem 3.3: ExpFedTD(0)
  • Remark 3.4
  • Remark 3.5
  • Remark 3.6
  • Remark 3.7
  • Lemma 4.1
  • Lemma 4.2
  • Lemma 4.3
  • ...and 8 more