Table of Contents
Fetching ...

Federated Learning for Heterogeneous Bandits with Unobserved Contexts

Jiabin Lin, Shana Moothedath

TL;DR

This work tackles federated contextual bandits where the exact contexts are unobserved by the agents. It introduces Fed-PECD, a phased elimination algorithm that leverages context distributions and federated communication to learn a common set of arm parameters across heterogeneous agents, while keeping local data private. The authors derive a high-probability regret bound of $R(T)=O\left( \frac{L}{\ell} \sqrt{d K M T\left( \log\left( \frac{K \log T}{\delta}\right) + \min\{d,\log M\} \right)}\right)$ and a corresponding communication cost $O\left(M d^2 K \log T\right)$, demonstrating scalability with the number of agents. Empirical results on synthetic data and Movielens show that collaboration accelerates learning, with exact context providing a noticeable advantage over hidden context, validating the practical effectiveness of the federated approach in environments with noisy or predicted contexts.

Abstract

We study the problem of federated stochastic multi-arm contextual bandits with unknown contexts, in which M agents are faced with different bandits and collaborate to learn. The communication model consists of a central server and the agents share their estimates with the central server periodically to learn to choose optimal actions in order to minimize the total regret. We assume that the exact contexts are not observable and the agents observe only a distribution of the contexts. Such a situation arises, for instance, when the context itself is a noisy measurement or based on a prediction mechanism. Our goal is to develop a distributed and federated algorithm that facilitates collaborative learning among the agents to select a sequence of optimal actions so as to maximize the cumulative reward. By performing a feature vector transformation, we propose an elimination-based algorithm and prove the regret bound for linearly parametrized reward functions. Finally, we validated the performance of our algorithm and compared it with another baseline approach using numerical simulations on synthetic data and on the real-world movielens dataset.

Federated Learning for Heterogeneous Bandits with Unobserved Contexts

TL;DR

This work tackles federated contextual bandits where the exact contexts are unobserved by the agents. It introduces Fed-PECD, a phased elimination algorithm that leverages context distributions and federated communication to learn a common set of arm parameters across heterogeneous agents, while keeping local data private. The authors derive a high-probability regret bound of and a corresponding communication cost , demonstrating scalability with the number of agents. Empirical results on synthetic data and Movielens show that collaboration accelerates learning, with exact context providing a noticeable advantage over hidden context, validating the practical effectiveness of the federated approach in environments with noisy or predicted contexts.

Abstract

We study the problem of federated stochastic multi-arm contextual bandits with unknown contexts, in which M agents are faced with different bandits and collaborate to learn. The communication model consists of a central server and the agents share their estimates with the central server periodically to learn to choose optimal actions in order to minimize the total regret. We assume that the exact contexts are not observable and the agents observe only a distribution of the contexts. Such a situation arises, for instance, when the context itself is a noisy measurement or based on a prediction mechanism. Our goal is to develop a distributed and federated algorithm that facilitates collaborative learning among the agents to select a sequence of optimal actions so as to maximize the cumulative reward. By performing a feature vector transformation, we propose an elimination-based algorithm and prove the regret bound for linearly parametrized reward functions. Finally, we validated the performance of our algorithm and compared it with another baseline approach using numerical simulations on synthetic data and on the real-world movielens dataset.
Paper Structure (13 sections, 7 theorems, 37 equations, 1 figure, 2 algorithms)

This paper contains 13 sections, 7 theorems, 37 equations, 1 figure, 2 algorithms.

Key Result

Theorem 1

Consider time horizon $T$ that consists of $H$ phases with $f_p = cn^p$, where $c$ and $n > 1$ are fixed integers and $n^p$ denotes the $pth$-power of $n$. Let where $k > 1$ is a number satisfying $kd \geqslant 2 log\left( KH / \delta\right) + d log\left( ke\right)$. Then, with probability (w.p) at least $1 - \delta$ the cumulative regret of our algorithm scales in and communication cost scales

Figures (1)

  • Figure 1: Per-agent (average) cumulative regret $R(T)$ versus time $T$. We compared the performance of our algorithm with the Fed-PE algorithm in huang2021federated (the variant in which the actual context is observable, i.e., exact). We performed the experiments for both the synthetic data and the movielens data. Synthetic data: Figure \ref{['fig:3']} for two different variants, exact and hidden and Figure \ref{['fig:4']} for different number of agents, $M=50, 100, 150$. Movielens data: Figure \ref{['fig:5']} presents the plot for two different variants, exact and hidden, and Figure \ref{['fig:6']} presents the plot for different numbers of agents, $M=50, 100, 150$. As expected, exact outperforms the hidden setting. The figures also show that the per-agent regret decreases as the number of agents increases, validating the benefit of collaborative learning.

Theorems & Definitions (10)

  • Theorem 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma .5
  • proof
  • Lemma .6
  • proof
  • Lemma .7
  • proof