Table of Contents
Fetching ...

The Sample-Communication Complexity Trade-off in Federated Q-Learning

Sudeep Salgia, Yuejie Chi

TL;DR

The trade-off between sample and communication complexities for the widely used class of intermittent communication algorithms is investigated and a new algorithm, called Fed-DVR-Q, is proposed, which is the first federated Q-learning algorithm to simultaneously achieve order-optimal sample and communication complexities.

Abstract

We consider the problem of federated Q-learning, where $M$ agents aim to collaboratively learn the optimal Q-function of an unknown infinite-horizon Markov decision process with finite state and action spaces. We investigate the trade-off between sample and communication complexities for the widely used class of intermittent communication algorithms. We first establish the converse result, where it is shown that a federated Q-learning algorithm that offers any speedup with respect to the number of agents in the per-agent sample complexity needs to incur a communication cost of at least an order of $\frac{1}{1-γ}$ up to logarithmic factors, where $γ$ is the discount factor. We also propose a new algorithm, called Fed-DVR-Q, which is the first federated Q-learning algorithm to simultaneously achieve order-optimal sample and communication complexities. Thus, together these results provide a complete characterization of the sample-communication complexity trade-off in federated Q-learning.

The Sample-Communication Complexity Trade-off in Federated Q-Learning

TL;DR

The trade-off between sample and communication complexities for the widely used class of intermittent communication algorithms is investigated and a new algorithm, called Fed-DVR-Q, is proposed, which is the first federated Q-learning algorithm to simultaneously achieve order-optimal sample and communication complexities.

Abstract

We consider the problem of federated Q-learning, where agents aim to collaboratively learn the optimal Q-function of an unknown infinite-horizon Markov decision process with finite state and action spaces. We investigate the trade-off between sample and communication complexities for the widely used class of intermittent communication algorithms. We first establish the converse result, where it is shown that a federated Q-learning algorithm that offers any speedup with respect to the number of agents in the per-agent sample complexity needs to incur a communication cost of at least an order of up to logarithmic factors, where is the discount factor. We also propose a new algorithm, called Fed-DVR-Q, which is the first federated Q-learning algorithm to simultaneously achieve order-optimal sample and communication complexities. Thus, together these results provide a complete characterization of the sample-communication complexity trade-off in federated Q-learning.
Paper Structure (69 sections, 12 theorems, 152 equations, 4 figures, 1 table, 5 algorithms)

This paper contains 69 sections, 12 theorems, 152 equations, 4 figures, 1 table, 5 algorithms.

Key Result

Theorem 1

Assume that $\gamma \in [5/6, 1)$ and the state and action spaces satisfy $|\mathcal{S}| \geq 4$ and $|\mathcal{A}| \geq 2$. Let $\mathscr{A}$ be a federated Q-learning algorithm with intermittent communication (as described in Algorithm alg:general_fed_alg_with_inter_comm) that is run for $T \geq \ for some universal constants $c_0, c_1 > 0$, then for all choices of communication schedule, batch

Figures (4)

  • Figure 1: Comparison between sample and communication complexities of Fed-DVR-Q and the algorithm Fed-SynQ from Woo2023FedSynQ.
  • Figure 2: Dependence of sample and communication complexities of Fed-DVR-Q on the number of agents.
  • Figure 3: Communication complexity of Fed-DVR-Q as a function of effective horizon, i.e., $\frac{1}{1-\gamma}$.
  • Figure : Fed-DVR-Q

Theorems & Definitions (14)

  • Theorem 1
  • Remark 1: Communication complexity of policy evaluation
  • Remark 2: Extension to asynchronous Q-learning
  • Theorem 2
  • Lemma 1: Li2023QLMinimax
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • ...and 4 more