Decentralized Federated Policy Gradient with Byzantine Fault-Tolerance and Provably Fast Convergence

Philip Jordan; Florian Grötschla; Flint Xiaofeng Fan; Roger Wattenhofer

Decentralized Federated Policy Gradient with Byzantine Fault-Tolerance and Provably Fast Convergence

Philip Jordan, Florian Grötschla, Flint Xiaofeng Fan, Roger Wattenhofer

TL;DR

This work addresses the challenge of Byzantine faults in decentralized Federated Reinforcement Learning (FRL) by introducing ByzPG, a centralized Byzantine-tolerant policy gradient method, and DecByzPG, a decentralized extension that dispenses with a trusted central agent. By leveraging robust aggregation and averaging agreement, the authors derive the first finite-time, sample-complexity guarantees for Byzantine-tolerant decentralized non-convex optimization in the RL setting. Theoretical results show that, under standard RL assumptions and controlled Byzantine fractions, the methods converge to an ε-stationary point with favorable scaling in the number of agents K and attack fraction α, and empirically demonstrate speed-ups with larger federations and resilience against several Byzantine attacks. Overall, the work provides a principled, scalable framework for robust, decentralized FRL with provable guarantees and practical validation on common RL benchmarks.

Abstract

In Federated Reinforcement Learning (FRL), agents aim to collaboratively learn a common task, while each agent is acting in its local environment without exchanging raw trajectories. Existing approaches for FRL either (a) do not provide any fault-tolerance guarantees (against misbehaving agents), or (b) rely on a trusted central agent (a single point of failure) for aggregating updates. We provide the first decentralized Byzantine fault-tolerant FRL method. Towards this end, we first propose a new centralized Byzantine fault-tolerant policy gradient (PG) algorithm that improves over existing methods by relying only on assumptions standard for non-fault-tolerant PG. Then, as our main contribution, we show how a combination of robust aggregation and Byzantine-resilient agreement methods can be leveraged in order to eliminate the need for a trusted central entity. Since our results represent the first sample complexity analysis for Byzantine fault-tolerant decentralized federated non-convex optimization, our technical contributions may be of independent interest. Finally, we corroborate our theoretical results experimentally for common RL environments, demonstrating the speed-up of decentralized federations w.r.t. the number of participating agents and resilience against various Byzantine attacks.

Decentralized Federated Policy Gradient with Byzantine Fault-Tolerance and Provably Fast Convergence

TL;DR

Abstract

Paper Structure (31 sections, 22 theorems, 39 equations, 6 figures, 1 table, 2 algorithms)

This paper contains 31 sections, 22 theorems, 39 equations, 6 figures, 1 table, 2 algorithms.

Introduction
Background & Related work
Setup and Assumptions
Distributed Computing Setup
Reinforcement Learning Assumptions
Centralized Byzantine-tolerant federated PG
Method
Convergence Analysis and Sample Complexity
Decentralized Byzantine-tolerant federated PG
Method
Convergence Analysis and Sample Complexity
Experiments
DecByzPG without Byzantine Agents
DecByzPG under Attack
Conclusion
...and 16 more sections

Key Result

Proposition 1

Under the above assumptions ass:bounded-policy, ass:smoothness, ass:fin-var, and ass:fin-imp-var, with $g(\tau\mid\theta)$ denoting the REINFORCE or GPOMDP gradient estimator, we have for all $\theta,\theta_1,\theta_2 \in \mathbb{R}^d$:

Figures (6)

Figure 1: Performance of DecByzPG for different federation sizes when all agents behave honestly (i.e. $\alpha=0$).
Figure 2: Performance & resilience of DecByzPG for CartPole w.r.t. our three attack types.
Figure 3: Performance & resilience of DecByzPG for LunarLander w.r.t. our three attack types.
Figure 4: Performance of ByzPG for different federation sizes when all agents behave honestly (i.e. $\alpha=0$).
Figure 5: Performance & resilience of ByzPG for CartPole w.r.t. our three attack types.
...and 1 more figures

Theorems & Definitions (26)

Proposition 1
Definition 1: robust aggregation
Theorem 1
Corollary 1
Definition 2: $K$-agent $\alpha$-tolerant $\epsilon$-approximate solution
Definition 3: Averaging Agreement
Theorem 2
Corollary 2
Remark
Lemma 1
...and 16 more

Decentralized Federated Policy Gradient with Byzantine Fault-Tolerance and Provably Fast Convergence

TL;DR

Abstract

Decentralized Federated Policy Gradient with Byzantine Fault-Tolerance and Provably Fast Convergence

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (26)