Table of Contents
Fetching ...

Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits

Tianyi Xu, Jiaxin Liu, Nicholas Mattei, Zizhan Zheng

TL;DR

This work tackles fair allocation in multi-agent multi-armed bandits by introducing a probing stage that gathers information about a subset of arms before assignment. It optimizes fairness via Nash Social Welfare (NSW) and develops both offline and online solutions: a greedy probing algorithm with submodularity-based guarantees in the offline setting and an online probing-enabled UCB method achieving sublinear regret under fairness constraints. Theoretical results show that probing can be integrated with NSW to balance exploration, exploitation, and equity, while experiments on synthetic and real data demonstrate improved fairness and efficiency over baselines. The framework has practical implications for ridesharing and wireless scheduling, where equitable access to profitable opportunities is as important as overall system performance.

Abstract

We propose a multi-agent multi-armed bandit (MA-MAB) framework aimed at ensuring fair outcomes across agents while maximizing overall system performance. A key challenge in this setting is decision-making under limited information about arm rewards. To address this, we introduce a novel probing framework that strategically gathers information about selected arms before allocation. In the offline setting, where reward distributions are known, we leverage submodular properties to design a greedy probing algorithm with a provable performance bound. For the more complex online setting, we develop an algorithm that achieves sublinear regret while maintaining fairness. Extensive experiments on synthetic and real-world datasets show that our approach outperforms baseline methods, achieving better fairness and efficiency.

Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits

TL;DR

This work tackles fair allocation in multi-agent multi-armed bandits by introducing a probing stage that gathers information about a subset of arms before assignment. It optimizes fairness via Nash Social Welfare (NSW) and develops both offline and online solutions: a greedy probing algorithm with submodularity-based guarantees in the offline setting and an online probing-enabled UCB method achieving sublinear regret under fairness constraints. Theoretical results show that probing can be integrated with NSW to balance exploration, exploitation, and equity, while experiments on synthetic and real data demonstrate improved fairness and efficiency over baselines. The framework has practical implications for ridesharing and wireless scheduling, where equitable access to profitable opportunities is as important as overall system performance.

Abstract

We propose a multi-agent multi-armed bandit (MA-MAB) framework aimed at ensuring fair outcomes across agents while maximizing overall system performance. A key challenge in this setting is decision-making under limited information about arm rewards. To address this, we introduce a novel probing framework that strategically gathers information about selected arms before allocation. In the offline setting, where reward distributions are known, we leverage submodular properties to design a greedy probing algorithm with a provable performance bound. For the more complex online setting, we develop an algorithm that achieves sublinear regret while maintaining fairness. Extensive experiments on synthetic and real-world datasets show that our approach outperforms baseline methods, achieving better fairness and efficiency.

Paper Structure

This paper contains 29 sections, 19 theorems, 181 equations, 2 figures, 2 algorithms.

Key Result

Lemma 1

For any $S \subseteq T \subseteq [A]$, we have $g(S) \le g(T)$.

Figures (2)

  • Figure 1: (a): Agents number $M=12$, arms number $A=8$, Bernoulli distribution for reward. (b): Agents number $M=20$, arms number $A=10$, Bernoulli distribution for reward. (c): Agents number $M=12$, arms number $A=8$, General distribution for reward. (d): Agents number $M=20$, arms number $A=10$, General distribution for reward. Data from NYYellowTaxi 2016.
  • Figure 2: Scalability analysis across two dimensions: (a) Fixed arms number $A=8$ with varying agents; (b) Fixed agents number $M=20$ with varying arms.

Theorems & Definitions (29)

  • Lemma 1: Monotonicity of $g(S)$
  • Lemma 2: Monotonicity
  • Lemma 3: Submodularity
  • Lemma 4: Monotonicity of $h(S)$
  • Lemma 5
  • Theorem 1
  • Lemma 6: Smoothness of the NSW Objective
  • Lemma 7: Concentration of Reward Estimates
  • Theorem 2
  • Lemma 1: Monotonicity of $g(S)$
  • ...and 19 more