Federated Control in Markov Decision Processes

Hao Jin; Yang Peng; Liangyu Zhang; Zhihua Zhang

Federated Control in Markov Decision Processes

Hao Jin, Yang Peng, Liangyu Zhang, Zhihua Zhang

TL;DR

This work tackles scalable reinforcement learning for MDPs by distributing state space among N agents with restricted regions. It introduces leakage probabilities to quantify inter-region connectivity and proposes FedQ, a Federated-Q protocol that periodically aggregates local Q-functions through an augmented local-MDP framework, yielding a federated Bellman operator with contraction properties. The authors establish convergence guarantees and derive general and specialized sample-complexity results (FedQ-X and FedQ-SynQ), showing linear speedup when workloads are uniformly distributed. Empirical results in RandomMDP and WindyCliff environments demonstrate faster convergence and scalable performance under region-specific data collection, supporting the practical viability of privacy-preserving federated control in large-scale RL. Overall, the framework provides provable efficiency guarantees for heterogeneous, privacy-conscious multi-agent RL in complex MDPs.

Abstract

We study problems of federated control in Markov Decision Processes. To solve an MDP with large state space, multiple learning agents are introduced to collaboratively learn its optimal policy without communication of locally collected experience. In our settings, these agents have limited capabilities, which means they are restricted within different regions of the overall state space during the training process. In face of the difference among restricted regions, we firstly introduce concepts of leakage probabilities to understand how such heterogeneity affects the learning process, and then propose a novel communication protocol that we call Federated-Q protocol (FedQ), which periodically aggregates agents' knowledge of their restricted regions and accordingly modifies their learning problems for further training. In terms of theoretical analysis, we justify the correctness of FedQ as a communication protocol, then give a general result on sample complexity of derived algorithms FedQ-X with the RL oracle , and finally conduct a thorough study on the sample complexity of FedQ-SynQ. Specifically, FedQ-X has been shown to enjoy linear speedup in terms of sample complexity when workload is uniformly distributed among agents. Moreover, we carry out experiments in various environments to justify the efficiency of our methods.

Federated Control in Markov Decision Processes

TL;DR

Abstract

Paper Structure (34 sections, 13 theorems, 108 equations, 4 figures, 2 algorithms)

This paper contains 34 sections, 13 theorems, 108 equations, 4 figures, 2 algorithms.

Introduction
Preliminaries
Classical control in Markov Decision Processes
Basics of Q-learning
Synchronous Q-learning
Federated Control in MDPs
General Problem Formulation
Leakage Probabilities in Federated Control
FedQ: Federated-Q Protocol
Augmented Local Markov Decision Process
Federated-Q Protocol
Analysis of FedQ
Convergence of FedQ
Local Bellman Operators
Federated Bellman Operator
...and 19 more sections

Key Result

Lemma 5.1

For any local Bellman operator $\mathcal{T}_\text{fed}^k$, it satisfies the contraction property as follows: where $\gamma_\text{fed}^k=\frac{\gamma p^k_{\max}}{1-\gamma (1-p^k_{\max})}\leq\gamma$.

Figures (4)

Figure 1: Convergence of FedQ-SynQ: the first row is in RandomMDP with different numbers of $N$, the second row is in RandomMDP with different values of $p_{\max}$, and the third row is in WindyCliff with different values of wind power $p$.
Figure 2: Iteration complexity of FedQ in different orthogonal cases of RandomMDP with $(N,K_S,E_S)=(10,20,0)$.
Figure 3: Examples of WindyCliff with a $6\times 6$ state space: Red $\{S_k\}_{k=1}^3$ represents horizontal splitting; blue $\{S_k\}_{k=1}^3$ represents vertical splitting; $p$ indicates the power of wind glowing downwards.
Figure 4: Convergence of FedQ-SynQ in different environments: the first row with $E_S=0.5,K_S,N_S=3,N=5$; the second row with $K_S=20,N_S=3,N=5$; the third row with $K_S=10,E_S=5,N=10$; the forth row with $10\times 10$ state space, $N=5$ agents and splitting direction as $v$; the fifth row with $6\times 6$ state space, $N=3$ agents and splitting direction as $v$; the sixth row with $6\times 6$ state space, $N=3$ agents and splitting direction as $h$.

Theorems & Definitions (21)

Lemma 5.1: Contraction property of $\mathcal{T}_\text{fed}^k$
Lemma 5.2: Contraction property of $\mathcal{T}_\text{fed}$
Lemma 5.3: Stationary point of $\mathcal{T}_\text{fed}$
Lemma 5.4: Convergence of FedQ-X
Theorem 5.1: Sample Complexity of FedQ-X
Theorem 5.2: Sample Complexity of FedQ-SynQ
Lemma A.1
proof
Lemma A.2: Contraction property of $\mathcal{T}_\text{fed}^k$
proof
...and 11 more

Federated Control in Markov Decision Processes

TL;DR

Abstract

Federated Control in Markov Decision Processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (21)