Table of Contents
Fetching ...

Finite-Time Analysis of On-Policy Heterogeneous Federated Reinforcement Learning

Chenyu Zhang, Han Wang, Aritra Mitra, James Anderson

TL;DR

This paper tackles the finite-time performance of federated on-policy reinforcement learning in heterogeneous MDPs by introducing FedSARSA, a SARSA-based FRL algorithm with linear function approximation. It establishes a perturbation bound for cross-agent optimality, proves a finite-time error bound that yields linear speedups with the number of agents under both fixed and decaying step-sizes, and characterizes a convergence region whose radius scales with environmental heterogeneity. The analysis leverages a mean-path semi-gradient framework, a contraction property of the nonlinear projected Bellman equation, and a backtracking technique to handle nonstationarity and Markovian sampling. Empirical results corroborate the theory, showing robust performance under heterogeneity and demonstrating the practical viability of federated collaboration for faster policy learning in multi-environment settings.

Abstract

Federated reinforcement learning (FRL) has emerged as a promising paradigm for reducing the sample complexity of reinforcement learning tasks by exploiting information from different agents. However, when each agent interacts with a potentially different environment, little to nothing is known theoretically about the non-asymptotic performance of FRL algorithms. The lack of such results can be attributed to various technical challenges and their intricate interplay: Markovian sampling, linear function approximation, multiple local updates to save communication, heterogeneity in the reward functions and transition kernels of the agents' MDPs, and continuous state-action spaces. Moreover, in the on-policy setting, the behavior policies vary with time, further complicating the analysis. In response, we introduce FedSARSA, a novel federated on-policy reinforcement learning scheme, equipped with linear function approximation, to address these challenges and provide a comprehensive finite-time error analysis. Notably, we establish that FedSARSA converges to a policy that is near-optimal for all agents, with the extent of near-optimality proportional to the level of heterogeneity. Furthermore, we prove that FedSARSA leverages agent collaboration to enable linear speedups as the number of agents increases, which holds for both fixed and adaptive step-size configurations.

Finite-Time Analysis of On-Policy Heterogeneous Federated Reinforcement Learning

TL;DR

This paper tackles the finite-time performance of federated on-policy reinforcement learning in heterogeneous MDPs by introducing FedSARSA, a SARSA-based FRL algorithm with linear function approximation. It establishes a perturbation bound for cross-agent optimality, proves a finite-time error bound that yields linear speedups with the number of agents under both fixed and decaying step-sizes, and characterizes a convergence region whose radius scales with environmental heterogeneity. The analysis leverages a mean-path semi-gradient framework, a contraction property of the nonlinear projected Bellman equation, and a backtracking technique to handle nonstationarity and Markovian sampling. Empirical results corroborate the theory, showing robust performance under heterogeneity and demonstrating the practical viability of federated collaboration for faster policy learning in multi-environment settings.

Abstract

Federated reinforcement learning (FRL) has emerged as a promising paradigm for reducing the sample complexity of reinforcement learning tasks by exploiting information from different agents. However, when each agent interacts with a potentially different environment, little to nothing is known theoretically about the non-asymptotic performance of FRL algorithms. The lack of such results can be attributed to various technical challenges and their intricate interplay: Markovian sampling, linear function approximation, multiple local updates to save communication, heterogeneity in the reward functions and transition kernels of the agents' MDPs, and continuous state-action spaces. Moreover, in the on-policy setting, the behavior policies vary with time, further complicating the analysis. In response, we introduce FedSARSA, a novel federated on-policy reinforcement learning scheme, equipped with linear function approximation, to address these challenges and provide a comprehensive finite-time error analysis. Notably, we establish that FedSARSA converges to a policy that is near-optimal for all agents, with the extent of near-optimality proportional to the level of heterogeneity. Furthermore, we prove that FedSARSA leverages agent collaboration to enable linear speedups as the number of agents increases, which holds for both fixed and adaptive step-size configurations.
Paper Structure (46 sections, 16 theorems, 217 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 46 sections, 16 theorems, 217 equations, 7 figures, 4 tables, 1 algorithm.

Key Result

theorem 1

There exist positive problem dependent constants $w$, $H$, and $\sigma$ such that where $\epsilon_p$ and $\epsilon_r$ are the perturbation bounds on environmental models defined in asmp:ker-hetasmp:r-het.

Figures (7)

  • Figure 1: Performance of under Markovian sampling.
  • Figure 2: Performance of under Markovian sampling for varying reward heterogeneity and numbers of agents with fixed kernel heterogeneity ($\epsilon_p=1$).
  • Figure 3: Performance of under Markovian sampling for varying kernel heterogeneity and numbers of agents with fixed reward heterogeneity ($\epsilon_r=1$).
  • Figure 4: Effect of the reward heterogeneity on the performance of .
  • Figure 5: Effect of the kernel heterogeneity on the performance of .
  • ...and 2 more figures

Theorems & Definitions (41)

  • definition 1: Transition kernel heterogeneity
  • definition 2: Reward heterogeneity
  • theorem 1: Perturbation bounds on SARSA fixed points
  • theorem 2: One-step progress
  • corollary 1: Finite-time error bound for constant step-size
  • corollary 2: Finite-time error bound for decaying step-size
  • proposition 1
  • proof
  • definition 3: Steady distributions
  • definition 4: Semi-gradients
  • ...and 31 more