Envious Explore and Exploit
Omer Ben-Porat, Yotam Gafni, Or Markovetzki
TL;DR
This work analyzes envy in explore-and-exploit multi-agent bandit-like systems with reward consistency, introducing arrival-order mechanisms (uniform, nudged, adversarial) and studying envy as the maximal cumulative reward gap under anonymous algorithms. It derives tight upper and lower bounds for envy under uniform arrival using martingale- and excursion-based analysis, shows that nudging can bound envy independently of the horizon, and proves linear envy growth under adversarial arrival. The paper also initiates a welfare-envy tradeoff, presenting a two-agent optimal welfare scheme and envy-bounded variants that achieve notable welfare gains while constraining envy, with empirical validation through simulations. The results highlight practical implications for fairness in recommendation systems and offer a foundation for extensions to more complex, realistic settings, including MDP-like reward dynamics and varied arrival patterns.
Abstract
Explore-and-exploit tradeoffs play a key role in recommendation systems (RSs), aiming at serving users better by learning from previous interactions. Despite their commercial success, the societal effects of explore-and-exploit mechanisms are not well understood, especially regarding the utility discrepancy they generate between different users. In this work, we measure such discrepancy using the economic notion of envy. We present a multi-armed bandit-like model in which every round consists of several sessions, and rewards are realized once per round. We call the latter property reward consistency, and show that the RS can leverage this property for better societal outcomes. On the downside, doing so also generates envy, as late-to-arrive users enjoy the information gathered by early-to-arrive users. We examine the generated envy under several arrival order mechanisms and virtually any anonymous algorithm, i.e., any algorithm that treats all similar users similarly without leveraging their identities. We provide tight envy bounds on uniform arrival and upper bound the envy for nudged arrival, in which the RS can affect the order of arrival by nudging its users. Furthermore, we study the efficiency-fairness trade-off by devising an algorithm that allows constant envy and approximates the optimal welfare in restricted settings. Finally, we validate our theoretical results empirically using simulations.
