Harnessing the Power of Federated Learning in Federated Contextual Bandits

Chengshuai Shi; Ruida Zhou; Kun Yang; Cong Shen

Harnessing the Power of Federated Learning in Federated Contextual Bandits

Chengshuai Shi, Ruida Zhou, Kun Yang, Cong Shen

TL;DR

The paper addresses the gap between canonical Federated Learning (FL) and Federated Contextual Bandits (FCB) by proposing FedIGW, an exploring design that couples inverse gap weighting (IGW) CB with flexible FL protocols. FedIGW operates in epochs where the FL routine learns a reward model from distributed interaction data, which then informs the IGW CB step for the next epoch; this decouples FL and CB while allowing plug-in of any FL protocol (e.g., FedAvg, SCAFFOLD) and FL appendages like personalization, robustness, and privacy. Theoretical guarantees express the global regret Reg(T) in terms of FL excess risk, enabling seamless incorporation of advances in FL convergence analyses; concrete corollaries cover finite and linear reward function classes and generalize to non-linear settings. Empirical results on Bibtex and Delicious demonstrate that FedIGW with various FL backbones outperforms baselines such as FN-UCB, validating its flexibility and practical impact. Overall, FedIGW provides a principled, modular bridge that unlocks the broader FL literature for FCB, with meaningful implications for personalized, private, and robust federated sequential decision-making.

Abstract

Federated learning (FL) has demonstrated great potential in revolutionizing distributed machine learning, and tremendous efforts have been made to extend it beyond the original focus on supervised learning. Among many directions, federated contextual bandits (FCB), a pivotal integration of FL and sequential decision-making, has garnered significant attention in recent years. Despite substantial progress, existing FCB approaches have largely employed their tailored FL components, often deviating from the canonical FL framework. Consequently, even renowned algorithms like FedAvg remain under-utilized in FCB, let alone other FL advancements. Motivated by this disconnection, this work takes one step towards building a tighter relationship between the canonical FL study and the investigations on FCB. In particular, a novel FCB design, termed FedIGW, is proposed to leverage a regression-based CB algorithm, i.e., inverse gap weighting. Compared with existing FCB approaches, the proposed FedIGW design can better harness the entire spectrum of FL innovations, which is concretely reflected as (1) flexible incorporation of (both existing and forthcoming) FL protocols; (2) modularized plug-in of FL analyses in performance guarantees; (3) seamless integration of FL appendages (such as personalization, robustness, and privacy). We substantiate these claims through rigorous theoretical analyses and empirical evaluations.

Harnessing the Power of Federated Learning in Federated Contextual Bandits

TL;DR

Abstract

Paper Structure (37 sections, 26 theorems, 95 equations, 6 figures, 5 tables, 3 algorithms)

This paper contains 37 sections, 26 theorems, 95 equations, 6 figures, 5 tables, 3 algorithms.

Introduction
Federated Contextual Bandits
Problem Formulation
The Current Disconnection Between FCB and FL
FedIGW: Flexible Incorporation of FL Protocols
System Model
Algorithm Design
Theoretical Guarantees: Modularized Plug-in of FL Analyses
A General Guarantee
Some Concretized Discussions
Experimental Results
Flexible Extensions: Seamless Integration of FL Appendages
Personalized Learning
Robustness, Privacy, and Beyond
Conclusions
...and 22 more sections

Key Result

Theorem 4.1

Using a learning rate $\gamma^ l = O\left(\sqrt{\sum_{m\in [M]}E^{l-1}_m K_m/(\sum_{m\in [M]}E^{l-1}_m\mathcal{E}(E^{l-1}_{[M]}))} \right)$ in epoch $l$, denoting $\bar{K}^l := \sum_{m\in [M]}E^l_m K_m/\sum_{m\in [M]}E^l_m$, the regret of FedIGW can be bounded as Here $\mathcal{E}(E^{l}_{[M]})$ (abbreviated from $\mathcal{E}(\mathcal{F}; E^{l}_{[M]})$) denotes the excess risk of the output from t

Figures (6)

Figure 1: The FCB design principle of periodically alternating between the employed CB and FL components.
Figure 2: Comparison between the FL components in existing FCB approaches and the FedIGW design proposed in this work, where the former requires tailored FL protocols while the latter can flexibly leverage both existing and forthcoming protocols in canonical FL studies. Additional comparisons regarding the FL components can be found in Appendix \ref{['subapp:example']}.
Figure 3: The averaged reward collected by each agent via FedIGW (using different FL protocols), the state-of-the-art FN-UCB, and two other naive baselines (i.e., greedy and softmax using FedAvg) with $M = 10$ participating agents on Bibtex (left) and Delicious (right) datasets.
Figure 4: Different Styles of Connecting FL and CB in FCB.
Figure 5: The averaged reward collected by each agent via FedIGW (with FedAvg and $M = 10$ participating agents) and two single-agent baselines on Bibtex (left) and Delicious (right) datasets.
...and 1 more figures

Theorems & Definitions (44)

Remark 3.2
Theorem 4.1
Corollary 4.2: A Finite Function Class
Lemma 4.3
Corollary 4.4: Modularized Plug-in of FL Analyses; A Simplified Version of Corollary \ref{['col:convex_raw_full']}
Remark 4.5: A Linear Reward Function Class
Remark 4.6: Beyond Linear Reward Functions
Remark 6.3: A Linear Reward Function Class
Definition C.1
Lemma C.2
...and 34 more

Harnessing the Power of Federated Learning in Federated Contextual Bandits

TL;DR

Abstract

Harnessing the Power of Federated Learning in Federated Contextual Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (44)