Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning

Tong Yang; Shicong Cen; Yuting Wei; Yuxin Chen; Yuejie Chi

Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning

Tong Yang, Shicong Cen, Yuting Wei, Yuxin Chen, Yuejie Chi

TL;DR

This work considers a multi-task setting, in which each agent has its own private reward function corresponding to different tasks, while sharing the same transition kernel of the environment, and establishes the first time that near dimension-free global convergence is established for federated multi-task RL using policy optimization.

Abstract

Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories. In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks, while sharing the same transition kernel of the environment. Focusing on infinite-horizon Markov decision processes, the goal is to learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner, where each agent only communicates with its neighbors over some prescribed graph topology. We develop federated vanilla and entropy-regularized natural policy gradient (NPG) methods in the tabular setting under softmax parameterization, where gradient tracking is applied to estimate the global Q-function to mitigate the impact of imperfect information sharing. We establish non-asymptotic global convergence guarantees under exact policy evaluation, where the rates are nearly independent of the size of the state-action space and illuminate the impacts of network size and connectivity. To the best of our knowledge, this is the first time that near dimension-free global convergence is established for federated multi-task RL using policy optimization. We further go beyond the tabular setting by proposing a federated natural actor critic (NAC) method for multi-task RL with function approximation, and establish its finite-time sample complexity taking the errors of function approximation into account.

Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning

TL;DR

Abstract

Paper Structure (82 sections, 34 theorems, 348 equations, 1 table, 5 algorithms)

This paper contains 82 sections, 34 theorems, 348 equations, 1 table, 5 algorithms.

Introduction
Our contributions
Related work
Global convergence of NPG methods for tabular MDPs.
Convergence and sample complexity bounds of NAC.
Distributed and federated RL.
Decentralized first-order optimization algorithms.
Notation.
Model and backgrounds
Markov decision processes
Markov decision processes.
Entropy-regularized RL
Natural policy gradient methods
Vanilla NPG method.
Entropy-regularized NPG method.
...and 67 more sections

Key Result

Theorem 1

Suppose $\pi_n^{(0)},n\in[N]$ are set as the uniform distribution. Then for $0<\eta\leq \eta_1\coloneqq \frac{(1-\sigma)^2(1-\gamma)^3}{16\sqrt{N}\sigma}$, we have Furthermore, the consensus error satisfies

Theorems & Definitions (51)

Definition 1: spectral radius
Theorem 1: Global sublinear convergence of exact FedNPG (informal)
Corollary 1: Iteration complexity of exact FedNPG
Theorem 2: Global sublinear convergence of inexact FedNPG (informal)
Remark 1: sample complexity bound of inexact FedNPG
Theorem 3: Global linear convergence of exact entropy-regularized FedNPG (informal)
Corollary 2: Iteration complexity of exact entropy-regularized FedNPG
Theorem 4: Global linear convergence of inexact entropy-regularized FedNPG (informal)
Theorem 5: Convergence rate of Algorithm \ref{['alg:actor_critic']} (informal)
Theorem 6
...and 41 more

Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning

TL;DR

Abstract

Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (51)