The Sample-Communication Complexity Trade-off in Federated Q-Learning

Sudeep Salgia; Yuejie Chi

The Sample-Communication Complexity Trade-off in Federated Q-Learning

Sudeep Salgia, Yuejie Chi

TL;DR

The trade-off between sample and communication complexities for the widely used class of intermittent communication algorithms is investigated and a new algorithm, called Fed-DVR-Q, is proposed, which is the first federated Q-learning algorithm to simultaneously achieve order-optimal sample and communication complexities.

Abstract

We consider the problem of federated Q-learning, where $M$ agents aim to collaboratively learn the optimal Q-function of an unknown infinite-horizon Markov decision process with finite state and action spaces. We investigate the trade-off between sample and communication complexities for the widely used class of intermittent communication algorithms. We first establish the converse result, where it is shown that a federated Q-learning algorithm that offers any speedup with respect to the number of agents in the per-agent sample complexity needs to incur a communication cost of at least an order of $\frac{1}{1-γ}$ up to logarithmic factors, where $γ$ is the discount factor. We also propose a new algorithm, called Fed-DVR-Q, which is the first federated Q-learning algorithm to simultaneously achieve order-optimal sample and communication complexities. Thus, together these results provide a complete characterization of the sample-communication complexity trade-off in federated Q-learning.

The Sample-Communication Complexity Trade-off in Federated Q-Learning

TL;DR

Abstract

We consider the problem of federated Q-learning, where

agents aim to collaboratively learn the optimal Q-function of an unknown infinite-horizon Markov decision process with finite state and action spaces. We investigate the trade-off between sample and communication complexities for the widely used class of intermittent communication algorithms. We first establish the converse result, where it is shown that a federated Q-learning algorithm that offers any speedup with respect to the number of agents in the per-agent sample complexity needs to incur a communication cost of at least an order of

up to logarithmic factors, where

is the discount factor. We also propose a new algorithm, called Fed-DVR-Q, which is the first federated Q-learning algorithm to simultaneously achieve order-optimal sample and communication complexities. Thus, together these results provide a complete characterization of the sample-communication complexity trade-off in federated Q-learning.

Paper Structure (69 sections, 12 theorems, 152 equations, 4 figures, 1 table, 5 algorithms)

This paper contains 69 sections, 12 theorems, 152 equations, 4 figures, 1 table, 5 algorithms.

Introduction
Main results
Related work
Single-agent Q-learning.
Federated and distributed RL.
Federated Q-learning.
Accuracy-communication trade-off in federated learning.
Background and Problem Formulation
Markov decision processes
Performance measures in federated Q-learning
Intermittent-communication algorithm protocols
Communication Complexity Lower Bound
The Fed-DVR-Q Algorithm
Algorithm description
The RefineEstimate sub-routine
...and 54 more sections

Key Result

Theorem 1

Assume that $\gamma \in [5/6, 1)$ and the state and action spaces satisfy $|\mathcal{S}| \geq 4$ and $|\mathcal{A}| \geq 2$. Let $\mathscr{A}$ be a federated Q-learning algorithm with intermittent communication (as described in Algorithm alg:general_fed_alg_with_inter_comm) that is run for $T \geq \ for some universal constants $c_0, c_1 > 0$, then for all choices of communication schedule, batch

Figures (4)

Figure 1: Comparison between sample and communication complexities of Fed-DVR-Q and the algorithm Fed-SynQ from Woo2023FedSynQ.
Figure 2: Dependence of sample and communication complexities of Fed-DVR-Q on the number of agents.
Figure 3: Communication complexity of Fed-DVR-Q as a function of effective horizon, i.e., $\frac{1}{1-\gamma}$.
Figure : Fed-DVR-Q

Theorems & Definitions (14)

Theorem 1
Remark 1: Communication complexity of policy evaluation
Remark 2: Extension to asynchronous Q-learning
Theorem 2
Lemma 1: Li2023QLMinimax
Lemma 2
Lemma 3
Lemma 4
Lemma 5
Lemma 6
...and 4 more

The Sample-Communication Complexity Trade-off in Federated Q-Learning

TL;DR

Abstract

The Sample-Communication Complexity Trade-off in Federated Q-Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (14)