Table of Contents
Fetching ...

Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments

Han Wang, Sihong He, Zhili Zhang, Fei Miao, James Anderson

TL;DR

This work addresses Federated Reinforcement Learning under arbitrarily heterogeneous environments by introducing FedSVRPG-M and FedHAPG-M, two momentum-based algorithms that learn a universal policy without sharing trajectories. By integrating variance reduction and Hessian information into federated policy gradients, the authors prove exact convergence to an $\varepsilon$-FOSP of the average objective and establish a sample complexity of $O(\varepsilon^{-3/2}/N)$ per agent, with linear speedups in the number of agents. The methods operate with constant local steps, single-trajectory samples, and a privacy-preserving communication pattern, making them practical for large-scale, heterogeneous FRL deployments. Empirical results in tabular and MuJoCo environments corroborate the theory, showing robustness to heterogeneity, improved performance, and evidence of linear scalability with $N$. Overall, the paper advances FRL by removing the bounded-heterogeneity assumption and enabling scalable collaboration across diverse environments.

Abstract

We explore a Federated Reinforcement Learning (FRL) problem where $N$ agents collaboratively learn a common policy without sharing their trajectory data. To date, existing FRL work has primarily focused on agents operating in the same or ``similar" environments. In contrast, our problem setup allows for arbitrarily large levels of environment heterogeneity. To obtain the optimal policy which maximizes the average performance across all potentially completely different environments, we propose two algorithms: FedSVRPG-M and FedHAPG-M. In contrast to existing results, we demonstrate that both FedSVRPG-M and FedHAPG-M, both of which leverage momentum mechanisms, can exactly converge to a stationary point of the average performance function, regardless of the magnitude of environment heterogeneity. Furthermore, by incorporating the benefits of variance-reduction techniques or Hessian approximation, both algorithms achieve state-of-the-art convergence results, characterized by a sample complexity of $\mathcal{O}\left(ε^{-\frac{3}{2}}/N\right)$. Notably, our algorithms enjoy linear convergence speedups with respect to the number of agents, highlighting the benefit of collaboration among agents in finding a common policy.

Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments

TL;DR

This work addresses Federated Reinforcement Learning under arbitrarily heterogeneous environments by introducing FedSVRPG-M and FedHAPG-M, two momentum-based algorithms that learn a universal policy without sharing trajectories. By integrating variance reduction and Hessian information into federated policy gradients, the authors prove exact convergence to an -FOSP of the average objective and establish a sample complexity of per agent, with linear speedups in the number of agents. The methods operate with constant local steps, single-trajectory samples, and a privacy-preserving communication pattern, making them practical for large-scale, heterogeneous FRL deployments. Empirical results in tabular and MuJoCo environments corroborate the theory, showing robustness to heterogeneity, improved performance, and evidence of linear scalability with . Overall, the paper advances FRL by removing the bounded-heterogeneity assumption and enabling scalable collaboration across diverse environments.

Abstract

We explore a Federated Reinforcement Learning (FRL) problem where agents collaboratively learn a common policy without sharing their trajectory data. To date, existing FRL work has primarily focused on agents operating in the same or ``similar" environments. In contrast, our problem setup allows for arbitrarily large levels of environment heterogeneity. To obtain the optimal policy which maximizes the average performance across all potentially completely different environments, we propose two algorithms: FedSVRPG-M and FedHAPG-M. In contrast to existing results, we demonstrate that both FedSVRPG-M and FedHAPG-M, both of which leverage momentum mechanisms, can exactly converge to a stationary point of the average performance function, regardless of the magnitude of environment heterogeneity. Furthermore, by incorporating the benefits of variance-reduction techniques or Hessian approximation, both algorithms achieve state-of-the-art convergence results, characterized by a sample complexity of . Notably, our algorithms enjoy linear convergence speedups with respect to the number of agents, highlighting the benefit of collaboration among agents in finding a common policy.
Paper Structure (28 sections, 15 theorems, 97 equations, 2 figures, 3 tables, 2 algorithms)

This paper contains 28 sections, 15 theorems, 97 equations, 2 figures, 3 tables, 2 algorithms.

Key Result

Theorem 6.4

(FedSVRPG-M) Under Assumption assume_policy--assume_IS, let $u_0=\frac{1}{N B} \sum_{i=1}^N \sum_{b=1}^B g_i\left(\tau_b^{(i)}|\theta_0\right)$ with $B=\left\lceil\frac{K}{R \beta^2}\right\rceil$ and $\left\{\tau^{(i)}_b\right\}_{b=1}^B \stackrel{iid}{\sim} p^{(i)}(\tau | \theta_0)$. There exists a where $\Delta \triangleq J\left(\theta^*\right)-J(\theta_0),\ G_0 \triangleq \frac{1}{N} \sum_{i=1}

Figures (2)

  • Figure 1: Mean rewards over global iterations for the CartPole and HalfCheetah tasks: (Top): FedSVRPG-M; (Bottom): FedHAPG-M.
  • Figure 2: Mean rewards over global iterations for the CartPole task under different values of $N$ (agent number): (Left): FedSVRPG-M; (Right): FedHAPG-M. The shaded areas represent the variance of rewards. Complying with theory, increasing N will increase the rewards. For both algorithms, the local step-size $\eta$ is $0.05$, global step-size $\lambda$ satisfies $\lambda = \eta K$ and the number of local updates $K$ is $10.$

Theorems & Definitions (24)

  • Theorem 6.4
  • Theorem 6.5
  • Corollary 6.6
  • Proposition 1
  • Lemma 2.1
  • proof
  • Lemma 2.2
  • Lemma 3.1
  • proof
  • Lemma 3.2
  • ...and 14 more