Parameter-Free Federated TD Learning with Markov Noise in Heterogeneous Environments
Ankur Naskar, Gugan Thoppe, Utsav Negi, Vijay Gupta
TL;DR
This work addresses federated reinforcement learning with Markovian data, where optimal TD learning rates previously required problem-parameter dependent stepsizes. It introduces parameter-free two-timescale Federated TD algorithms with Polyak-Ruppert averaging, achieving the optimal $ ilde{O}(1/(NT))$ convergence in both average-reward and discounted settings, and extending to heterogeneous MDPs. The theoretical results hinge on decoupling the slower average-reward estimate from the faster TD parameter updates via PR averaging, yielding linear speedups in the number of agents up to a heterogeneity gap. Empirical experiments on synthetic Markovian tasks validate the theory, showing competitive performance with IID baselines and robustness to heterogeneity. The findings have practical implications for scalable, privacy-preserving FRL in diverse environments, enabling parameter-free guarantees without needing problem-specific quantities.
Abstract
Federated learning (FL) can dramatically speed up reinforcement learning by distributing exploration and training across multiple agents. It can guarantee an optimal convergence rate that scales linearly in the number of agents, i.e., a rate of $\tilde{O}(1/(NT)),$ where $T$ is the iteration index and $N$ is the number of agents. However, when the training samples arise from a Markov chain, existing results on TD learning achieving this rate require the algorithm to depend on unknown problem parameters. We close this gap by proposing a two-timescale Federated Temporal Difference (FTD) learning with Polyak-Ruppert averaging. Our method provably attains the optimal $\tilde{O}(1/NT)$ rate in both average-reward and discounted settings--offering a parameter-free FTD approach for Markovian data. Although our results are novel even in the single-agent setting, they apply to the more realistic and challenging scenario of FL with heterogeneous environments.
