Table of Contents
Fetching ...

Federated Control in Markov Decision Processes

Hao Jin, Yang Peng, Liangyu Zhang, Zhihua Zhang

TL;DR

This work tackles scalable reinforcement learning for MDPs by distributing state space among N agents with restricted regions. It introduces leakage probabilities to quantify inter-region connectivity and proposes FedQ, a Federated-Q protocol that periodically aggregates local Q-functions through an augmented local-MDP framework, yielding a federated Bellman operator with contraction properties. The authors establish convergence guarantees and derive general and specialized sample-complexity results (FedQ-X and FedQ-SynQ), showing linear speedup when workloads are uniformly distributed. Empirical results in RandomMDP and WindyCliff environments demonstrate faster convergence and scalable performance under region-specific data collection, supporting the practical viability of privacy-preserving federated control in large-scale RL. Overall, the framework provides provable efficiency guarantees for heterogeneous, privacy-conscious multi-agent RL in complex MDPs.

Abstract

We study problems of federated control in Markov Decision Processes. To solve an MDP with large state space, multiple learning agents are introduced to collaboratively learn its optimal policy without communication of locally collected experience. In our settings, these agents have limited capabilities, which means they are restricted within different regions of the overall state space during the training process. In face of the difference among restricted regions, we firstly introduce concepts of leakage probabilities to understand how such heterogeneity affects the learning process, and then propose a novel communication protocol that we call Federated-Q protocol (FedQ), which periodically aggregates agents' knowledge of their restricted regions and accordingly modifies their learning problems for further training. In terms of theoretical analysis, we justify the correctness of FedQ as a communication protocol, then give a general result on sample complexity of derived algorithms FedQ-X with the RL oracle , and finally conduct a thorough study on the sample complexity of FedQ-SynQ. Specifically, FedQ-X has been shown to enjoy linear speedup in terms of sample complexity when workload is uniformly distributed among agents. Moreover, we carry out experiments in various environments to justify the efficiency of our methods.

Federated Control in Markov Decision Processes

TL;DR

This work tackles scalable reinforcement learning for MDPs by distributing state space among N agents with restricted regions. It introduces leakage probabilities to quantify inter-region connectivity and proposes FedQ, a Federated-Q protocol that periodically aggregates local Q-functions through an augmented local-MDP framework, yielding a federated Bellman operator with contraction properties. The authors establish convergence guarantees and derive general and specialized sample-complexity results (FedQ-X and FedQ-SynQ), showing linear speedup when workloads are uniformly distributed. Empirical results in RandomMDP and WindyCliff environments demonstrate faster convergence and scalable performance under region-specific data collection, supporting the practical viability of privacy-preserving federated control in large-scale RL. Overall, the framework provides provable efficiency guarantees for heterogeneous, privacy-conscious multi-agent RL in complex MDPs.

Abstract

We study problems of federated control in Markov Decision Processes. To solve an MDP with large state space, multiple learning agents are introduced to collaboratively learn its optimal policy without communication of locally collected experience. In our settings, these agents have limited capabilities, which means they are restricted within different regions of the overall state space during the training process. In face of the difference among restricted regions, we firstly introduce concepts of leakage probabilities to understand how such heterogeneity affects the learning process, and then propose a novel communication protocol that we call Federated-Q protocol (FedQ), which periodically aggregates agents' knowledge of their restricted regions and accordingly modifies their learning problems for further training. In terms of theoretical analysis, we justify the correctness of FedQ as a communication protocol, then give a general result on sample complexity of derived algorithms FedQ-X with the RL oracle , and finally conduct a thorough study on the sample complexity of FedQ-SynQ. Specifically, FedQ-X has been shown to enjoy linear speedup in terms of sample complexity when workload is uniformly distributed among agents. Moreover, we carry out experiments in various environments to justify the efficiency of our methods.
Paper Structure (34 sections, 13 theorems, 108 equations, 4 figures, 2 algorithms)

This paper contains 34 sections, 13 theorems, 108 equations, 4 figures, 2 algorithms.

Key Result

Lemma 5.1

For any local Bellman operator $\mathcal{T}_\text{fed}^k$, it satisfies the contraction property as follows: where $\gamma_\text{fed}^k=\frac{\gamma p^k_{\max}}{1-\gamma (1-p^k_{\max})}\leq\gamma$.

Figures (4)

  • Figure 1: Convergence of FedQ-SynQ: the first row is in RandomMDP with different numbers of $N$, the second row is in RandomMDP with different values of $p_{\max}$, and the third row is in WindyCliff with different values of wind power $p$.
  • Figure 2: Iteration complexity of FedQ in different orthogonal cases of RandomMDP with $(N,K_S,E_S)=(10,20,0)$.
  • Figure 3: Examples of WindyCliff with a $6\times 6$ state space: Red $\{S_k\}_{k=1}^3$ represents horizontal splitting; blue $\{S_k\}_{k=1}^3$ represents vertical splitting; $p$ indicates the power of wind glowing downwards.
  • Figure 4: Convergence of FedQ-SynQ in different environments: the first row with $E_S=0.5,K_S,N_S=3,N=5$; the second row with $K_S=20,N_S=3,N=5$; the third row with $K_S=10,E_S=5,N=10$; the forth row with $10\times 10$ state space, $N=5$ agents and splitting direction as $v$; the fifth row with $6\times 6$ state space, $N=3$ agents and splitting direction as $v$; the sixth row with $6\times 6$ state space, $N=3$ agents and splitting direction as $h$.

Theorems & Definitions (21)

  • Lemma 5.1: Contraction property of $\mathcal{T}_\text{fed}^k$
  • Lemma 5.2: Contraction property of $\mathcal{T}_\text{fed}$
  • Lemma 5.3: Stationary point of $\mathcal{T}_\text{fed}$
  • Lemma 5.4: Convergence of FedQ-X
  • Theorem 5.1: Sample Complexity of FedQ-X
  • Theorem 5.2: Sample Complexity of FedQ-SynQ
  • Lemma A.1
  • proof
  • Lemma A.2: Contraction property of $\mathcal{T}_\text{fed}^k$
  • proof
  • ...and 11 more