Table of Contents
Fetching ...

Global Convergence Guarantees for Federated Policy Gradient Methods with Adversaries

Swetha Ganesh, Jiayu Chen, Gugan Thoppe, Vaneet Aggarwal

TL;DR

This work proposes a policy gradient based approach that is robust to adversarial agents which can send arbitrary values to the server, and results form the first global convergence guarantees with general parametrization.

Abstract

Federated Reinforcement Learning (FRL) allows multiple agents to collaboratively build a decision making policy without sharing raw trajectories. However, if a small fraction of these agents are adversarial, it can lead to catastrophic results. We propose a policy gradient based approach that is robust to adversarial agents which can send arbitrary values to the server. Under this setting, our results form the first global convergence guarantees with general parametrization. These results demonstrate resilience with adversaries, while achieving optimal sample complexity of order $\tilde{\mathcal{O}}\left( \frac{1}{Nε^2} \left( 1+ \frac{f^2}{N}\right)\right)$, where $N$ is the total number of agents and $f<N/2$ is the number of adversarial agents.

Global Convergence Guarantees for Federated Policy Gradient Methods with Adversaries

TL;DR

This work proposes a policy gradient based approach that is robust to adversarial agents which can send arbitrary values to the server, and results form the first global convergence guarantees with general parametrization.

Abstract

Federated Reinforcement Learning (FRL) allows multiple agents to collaboratively build a decision making policy without sharing raw trajectories. However, if a small fraction of these agents are adversarial, it can lead to catastrophic results. We propose a policy gradient based approach that is robust to adversarial agents which can send arbitrary values to the server. Under this setting, our results form the first global convergence guarantees with general parametrization. These results demonstrate resilience with adversaries, while achieving optimal sample complexity of order , where is the total number of agents and is the number of adversarial agents.
Paper Structure (15 sections, 10 theorems, 68 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 15 sections, 10 theorems, 68 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Theorem 3.5

Consider Algorithm alg:(N)-HARPG with $\gamma_t = \frac{6G_1}{\mu_F(t+2)}$, $\eta_t = \frac{1}{t }$ and $H = (1-\gamma)^{-1}\log (T+1)$. Let Assumptions assump: compatible error, assump: strong convexity, assump: variance and assump: conditions on score function hold. Then for every $T \geq 1$ the o

Figures (4)

  • Figure 1: Evaluation results of Res-NHARPG on CartPole and InvertedPendulum. We test Res-NHARPG with the six aggregators as shown in Table \ref{['table:sample complexities']}. For baselines, we select Res-NHARPG with a simple mean (SM) function as the aggregator, which is equivalent to the original N-HARPG algorithm, and a vanilla policy gradient method with the simple mean aggregator (PG-SM). For each environment, there are ten workers, of which three are adversaries, and we simulate three types of attacks: random noise, random action, and sign flipping. It can be observed that N-HARPG outperforms PG and Res-NHARPG with those $(f, \lambda)$ aggregators can effectively handle multiple types of attacks during the learning process.
  • Figure 2: Res-NHARPG with the MDA aggregator consistently outperform the baselines: N-HARPG (i.e., SM) and Vanilla PG (i.e., PG-SM), on a series of MuJoCo tasks.
  • Figure 3: Evaluation of Fed-ADMM lan2023improved on MuJoCo tasks with random noise. The solid lines represent the mean performance, while the shaded areas indicate the 95% confidence intervals from repeated experiments. We used the official implementation from lan2023improved.
  • Figure : Resilient Normalized Hessian-Aided Recursive Policy Gradient (Res-NHARPG)

Theorems & Definitions (15)

  • Definition 2.1: $(f, \, \lambda)$-Resilient averaging
  • Theorem 3.5
  • Lemma 3.6: alistarh2018byzantine
  • Remark 3.7
  • Remark 3.8
  • Remark 3.9
  • Lemma 4.1
  • Lemma 4.2
  • Lemma 4.3
  • Lemma C.1
  • ...and 5 more