Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits
Tianyi Xu, Jiaxin Liu, Nicholas Mattei, Zizhan Zheng
TL;DR
This work tackles fair allocation in multi-agent multi-armed bandits by introducing a probing stage that gathers information about a subset of arms before assignment. It optimizes fairness via Nash Social Welfare (NSW) and develops both offline and online solutions: a greedy probing algorithm with submodularity-based guarantees in the offline setting and an online probing-enabled UCB method achieving sublinear regret under fairness constraints. Theoretical results show that probing can be integrated with NSW to balance exploration, exploitation, and equity, while experiments on synthetic and real data demonstrate improved fairness and efficiency over baselines. The framework has practical implications for ridesharing and wireless scheduling, where equitable access to profitable opportunities is as important as overall system performance.
Abstract
We propose a multi-agent multi-armed bandit (MA-MAB) framework aimed at ensuring fair outcomes across agents while maximizing overall system performance. A key challenge in this setting is decision-making under limited information about arm rewards. To address this, we introduce a novel probing framework that strategically gathers information about selected arms before allocation. In the offline setting, where reward distributions are known, we leverage submodular properties to design a greedy probing algorithm with a provable performance bound. For the more complex online setting, we develop an algorithm that achieves sublinear regret while maintaining fairness. Extensive experiments on synthetic and real-world datasets show that our approach outperforms baseline methods, achieving better fairness and efficiency.
