Table of Contents
Fetching ...

Single-Loop Federated Actor-Critic across Heterogeneous Environments

Ye Zhu, Xiaowen Gong

TL;DR

SFAC introduces a two-level federated actor-critic framework to learn a single global policy across heterogeneous environments. It decomposes learning into FedC for federated TD-based critic evaluation and FedA for federated policy improvement, operating under a mixture environment to reflect agent heterogeneity. The paper proves finite-time convergence to a near-stationary point, with the convergence error scaling with environment heterogeneity and a linear speedup in sample complexity as the number of agents, $N$, increases. Empirical results on standard RL benchmarks demonstrate improved performance and faster convergence compared to baselines.

Abstract

Federated reinforcement learning (FRL) has emerged as a promising paradigm, enabling multiple agents to collaborate and learn a shared policy adaptable across heterogeneous environments. Among the various reinforcement learning (RL) algorithms, the actor-critic (AC) algorithm stands out for its low variance and high sample efficiency. However, little to nothing is known theoretically about AC in a federated manner, especially each agent interacts with a potentially different environment. The lack of such results is attributed to various technical challenges: a two-level structure illustrating the coupling effect between the actor and the critic, heterogeneous environments, Markovian sampling and multiple local updates. In response, we study \textit{Single-loop Federated Actor Critic} (SFAC) where agents perform actor-critic learning in a two-level federated manner while interacting with heterogeneous environments. We then provide bounds on the convergence error of SFAC. The results show that the convergence error asymptotically converges to a near-stationary point, with the extent proportional to environment heterogeneity. Moreover, the sample complexity exhibits a linear speed-up through the federation of agents. We evaluate the performance of SFAC through numerical experiments using common RL benchmarks, which demonstrate its effectiveness.

Single-Loop Federated Actor-Critic across Heterogeneous Environments

TL;DR

SFAC introduces a two-level federated actor-critic framework to learn a single global policy across heterogeneous environments. It decomposes learning into FedC for federated TD-based critic evaluation and FedA for federated policy improvement, operating under a mixture environment to reflect agent heterogeneity. The paper proves finite-time convergence to a near-stationary point, with the convergence error scaling with environment heterogeneity and a linear speedup in sample complexity as the number of agents, , increases. Empirical results on standard RL benchmarks demonstrate improved performance and faster convergence compared to baselines.

Abstract

Federated reinforcement learning (FRL) has emerged as a promising paradigm, enabling multiple agents to collaborate and learn a shared policy adaptable across heterogeneous environments. Among the various reinforcement learning (RL) algorithms, the actor-critic (AC) algorithm stands out for its low variance and high sample efficiency. However, little to nothing is known theoretically about AC in a federated manner, especially each agent interacts with a potentially different environment. The lack of such results is attributed to various technical challenges: a two-level structure illustrating the coupling effect between the actor and the critic, heterogeneous environments, Markovian sampling and multiple local updates. In response, we study \textit{Single-loop Federated Actor Critic} (SFAC) where agents perform actor-critic learning in a two-level federated manner while interacting with heterogeneous environments. We then provide bounds on the convergence error of SFAC. The results show that the convergence error asymptotically converges to a near-stationary point, with the extent proportional to environment heterogeneity. Moreover, the sample complexity exhibits a linear speed-up through the federation of agents. We evaluate the performance of SFAC through numerical experiments using common RL benchmarks, which demonstrate its effectiveness.

Paper Structure

This paper contains 18 sections, 9 theorems, 81 equations, 2 figures, 1 table, 3 algorithms.

Key Result

Proposition 1

For any policy $\pi_{\theta_k}$, $T$ represents the number of communication rounds for critics' federation. Consider FedC shown in Algorithm 2, assuming Assumptions 1 and 3 hold, we have: where $\lambda$, $C_1$, $C_2$, $C_3$ and $C_4$ are positive problem dependent constants and the detailed definitions of the constants are provided in the appendix. Note that when the heterogeneity level $\kappa^2

Figures (2)

  • Figure 1: SFAC Performance in Comparison to A3C
  • Figure 2: Performance of SFAC

Theorems & Definitions (13)

  • Proposition 1
  • Remark 1
  • Theorem 1
  • Remark 2
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • Lemma 4
  • ...and 3 more