Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents

Safwan Labbi; Daniil Tiapkin; Lorenzo Mancini; Paul Mangold; Eric Moulines

Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents

Safwan Labbi, Daniil Tiapkin, Lorenzo Mancini, Paul Mangold, Eric Moulines

TL;DR

It is shown that, unlike existing federated reinforcement learning approaches, the Fed-UCBVI's communication complexity only marginally increases with the number of agents, and its regret scales as $\tilde{\mathcal{O}}(\sqrt{H^3 |\mathcal{S}| |\mathcal{A}| T / M})$, with a small additional term due to heterogeneity.

Abstract

In this paper, we present the Federated Upper Confidence Bound Value Iteration algorithm ($\texttt{Fed-UCBVI}$), a novel extension of the $\texttt{UCBVI}$ algorithm (Azar et al., 2017) tailored for the federated learning framework. We prove that the regret of $\texttt{Fed-UCBVI}$ scales as $\tilde{\mathcal{O}}(\sqrt{H^3 |\mathcal{S}| |\mathcal{A}| T / M})$, with a small additional term due to heterogeneity, where $|\mathcal{S}|$ is the number of states, $|\mathcal{A}|$ is the number of actions, $H$ is the episode length, $M$ is the number of agents, and $T$ is the number of episodes. Notably, in the single-agent setting, this upper bound matches the minimax lower bound up to polylogarithmic factors, while in the multi-agent scenario, $\texttt{Fed-UCBVI}$ has linear speed-up. To conduct our analysis, we introduce a new measure of heterogeneity, which may hold independent theoretical interest. Furthermore, we show that, unlike existing federated reinforcement learning approaches, $\texttt{Fed-UCBVI}$'s communication complexity only marginally increases with the number of agents.

Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents

TL;DR

It is shown that, unlike existing federated reinforcement learning approaches, the Fed-UCBVI's communication complexity only marginally increases with the number of agents, and its regret scales as

, with a small additional term due to heterogeneity.

Abstract

In this paper, we present the Federated Upper Confidence Bound Value Iteration algorithm (

), a novel extension of the

algorithm (Azar et al., 2017) tailored for the federated learning framework. We prove that the regret of

scales as

, with a small additional term due to heterogeneity, where

is the number of states,

is the number of actions,

is the episode length,

is the number of agents, and

is the number of episodes. Notably, in the single-agent setting, this upper bound matches the minimax lower bound up to polylogarithmic factors, while in the multi-agent scenario,

has linear speed-up. To conduct our analysis, we introduce a new measure of heterogeneity, which may hold independent theoretical interest. Furthermore, we show that, unlike existing federated reinforcement learning approaches,

's communication complexity only marginally increases with the number of agents.

Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents

TL;DR

Abstract

Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (38)