Table of Contents
Fetching ...

Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents

Safwan Labbi, Daniil Tiapkin, Lorenzo Mancini, Paul Mangold, Eric Moulines

TL;DR

It is shown that, unlike existing federated reinforcement learning approaches, the Fed-UCBVI's communication complexity only marginally increases with the number of agents, and its regret scales as $\tilde{\mathcal{O}}(\sqrt{H^3 |\mathcal{S}| |\mathcal{A}| T / M})$, with a small additional term due to heterogeneity.

Abstract

In this paper, we present the Federated Upper Confidence Bound Value Iteration algorithm ($\texttt{Fed-UCBVI}$), a novel extension of the $\texttt{UCBVI}$ algorithm (Azar et al., 2017) tailored for the federated learning framework. We prove that the regret of $\texttt{Fed-UCBVI}$ scales as $\tilde{\mathcal{O}}(\sqrt{H^3 |\mathcal{S}| |\mathcal{A}| T / M})$, with a small additional term due to heterogeneity, where $|\mathcal{S}|$ is the number of states, $|\mathcal{A}|$ is the number of actions, $H$ is the episode length, $M$ is the number of agents, and $T$ is the number of episodes. Notably, in the single-agent setting, this upper bound matches the minimax lower bound up to polylogarithmic factors, while in the multi-agent scenario, $\texttt{Fed-UCBVI}$ has linear speed-up. To conduct our analysis, we introduce a new measure of heterogeneity, which may hold independent theoretical interest. Furthermore, we show that, unlike existing federated reinforcement learning approaches, $\texttt{Fed-UCBVI}$'s communication complexity only marginally increases with the number of agents.

Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents

TL;DR

It is shown that, unlike existing federated reinforcement learning approaches, the Fed-UCBVI's communication complexity only marginally increases with the number of agents, and its regret scales as , with a small additional term due to heterogeneity.

Abstract

In this paper, we present the Federated Upper Confidence Bound Value Iteration algorithm (), a novel extension of the algorithm (Azar et al., 2017) tailored for the federated learning framework. We prove that the regret of scales as , with a small additional term due to heterogeneity, where is the number of states, is the number of actions, is the episode length, is the number of agents, and is the number of episodes. Notably, in the single-agent setting, this upper bound matches the minimax lower bound up to polylogarithmic factors, while in the multi-agent scenario, has linear speed-up. To conduct our analysis, we introduce a new measure of heterogeneity, which may hold independent theoretical interest. Furthermore, we show that, unlike existing federated reinforcement learning approaches, 's communication complexity only marginally increases with the number of agents.

Paper Structure

This paper contains 60 sections, 23 theorems, 194 equations, 3 figures, 2 tables, 2 algorithms.

Key Result

Lemma 4.1

With probability at least $1- \delta$, the number of communication rounds of algo:FEDUCBVI is bounded by where logarithmic dependence in $|\mathcal{S}|, |\mathcal{A}|$, $H$, $1/\delta$ and $M$ is ignored.

Figures (3)

  • Figure 1: Common regret (lower is better) for $M = 20$ agents as a function of $T$ for different $\varepsilon_{\mathsf{p}}$: crosses represent \ref{['algo:FEDUCBVI']}, and circles FedQ-Bernstein.
  • Figure 2: Common regret (lower is better), at $T= 3 \cdot 10^4$ for GridWorld, and $T=3 \cdot 10^3$ for synthetic as a function of $M$ for different $\varepsilon_{\mathsf{p}}$ in a log-log scale: crosses represent \ref{['algo:FEDUCBVI']}, and circles represent FedQ-Bernstein.
  • Figure 3: Number of communication (lower is better) as a function of $M$ for different $\varepsilon_{\mathsf{p}}$ and $T=3 \cdot 10^4$ for GridWorld, $T=3 \cdot 10^3$ for synthetic: crosses represent \ref{['algo:FEDUCBVI']}, and circles represent FedQ-Bernstein.

Theorems & Definitions (38)

  • Lemma 4.1: Communication Complexity
  • Theorem 4.1
  • Lemma C.1
  • proof
  • Lemma C.2
  • proof
  • Lemma D.1: Lemma 14 by zhang2021reinforcement
  • proof
  • Lemma D.2
  • proof
  • ...and 28 more