Fairness and Privacy Guarantees in Federated Contextual Bandits

Sambhav Solanki; Shweta Jain; Sujit Gujar

Fairness and Privacy Guarantees in Federated Contextual Bandits

Sambhav Solanki, Shweta Jain, Sujit Gujar

TL;DR

The paper tackles fairness and privacy in federated contextual bandits by formalizing fairness of exposure as proportional allocation of action exposure to merit and introducing two algorithms, Fed-FairX-LinUCB and Priv-FairX-LinUCB. It derives sub-linear fairness regret bounds and federated differential privacy guarantees via a bounded-communication protocol and a tree-based privatizer, respectively. Empirical results on synthetic data demonstrate that collaborative learning improves fairness performance and privacy budgets enable tunable trade-offs between protection and regret, with near-optimal fairness regret compared to single-agent learning. Overall, the work provides a principled framework for fair, privacy-preserving collaboration in distributed contextual bandit settings with practical implications for crowdsourcing and recommender systems.

Abstract

This paper considers the contextual multi-armed bandit (CMAB) problem with fairness and privacy guarantees in a federated environment. We consider merit-based exposure as the desired fair outcome, which provides exposure to each action in proportion to the reward associated. We model the algorithm's effectiveness using fairness regret, which captures the difference between fair optimal policy and the policy output by the algorithm. Applying fair CMAB algorithm to each agent individually leads to fairness regret linear in the number of agents. We propose that collaborative -- federated learning can be more effective and provide the algorithm Fed-FairX-LinUCB that also ensures differential privacy. The primary challenge in extending the existing privacy framework is designing the communication protocol for communicating required information across agents. A naive protocol can either lead to weaker privacy guarantees or higher regret. We design a novel communication protocol that allows for (i) Sub-linear theoretical bounds on fairness regret for Fed-FairX-LinUCB and comparable bounds for the private counterpart, Priv-FairX-LinUCB (relative to single-agent learning), (ii) Effective use of privacy budget in Priv-FairX-LinUCB. We demonstrate the efficacy of our proposed algorithm with extensive simulations-based experiments. We show that both Fed-FairX-LinUCB and Priv-FairX-LinUCB achieve near-optimal fairness regret.

Fairness and Privacy Guarantees in Federated Contextual Bandits

TL;DR

Abstract

Paper Structure (30 sections, 9 theorems, 11 equations, 2 figures, 2 algorithms)

This paper contains 30 sections, 9 theorems, 11 equations, 2 figures, 2 algorithms.

Introduction
Related Work
Model Preliminaries
Setting and Notations
Why fairness of exposure?
Fairness in Single-Agent Contextual MAB
Privacy requirements
Goal:
Multi-Agent Fair and Private Contextual Bandit Algorithm
Fed-FairX-LinUCB
Priv-FairX-LinUCB
Theoretical Analysis
Regret Analysis
Privacy Guarantees
Experimental Analysis
...and 15 more sections

Key Result

Lemma 1

(Elliptical Potential shariff2018differentially). Let ${x}_1, \ldots, {x}_n \in R^d$ be vectors with each $\left\| {x}_t\right\| \leq L$. Given a positive definite matrix $U_1 \in R^{d \times d}$, define $U_{t+1}:=U_t+x_t {x}_t^{\top}$ for all $t$. Then $\sum_{t=1}^n \min \left\{1,\left\|x_t\right\|

Figures (2)

Figure 1: (a) Exp 1 : Fairness Regret vs. Rounds for single-agent baseline and proposed federated learning algorithms (m=10) (b) Exp 2 : Fairness Regret vs. Rounds for different communication protocol baselines and proposed algorithms (m=10) (c) Exp 3 : Fairness Regret trend w.r.t. number of agents (t=100,000) (d) Exp 4 : Fairness Regret trend w.r.t. privacy budget (t=100,000)
Figure 2: (a) Exp 1 : Reward Regret vs. Rounds for single-agent baseline and proposed federated learning algorithms (m=10) (b) Exp 2 : Reward Regret vs. Rounds for different communication protocol baselines and proposed algorithms (m=10) (c) Exp 3 : Reward Regret trend w.r.t. number of agents (t=100,000) (d) Exp 4 : Reward Regret trend w.r.t. privacy budget (t=100,000)

Theorems & Definitions (15)

Definition 1
Definition 2
Lemma 1
Lemma 2
Lemma 3
Theorem 1
Lemma 4
Theorem 2
Claim 1
Lemma 5
...and 5 more

Fairness and Privacy Guarantees in Federated Contextual Bandits

TL;DR

Abstract

Fairness and Privacy Guarantees in Federated Contextual Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (15)