Learning to Mitigate Externalities: the Coase Theorem with Hindsight Rationality
Antoine Scheid, Aymeric Capitaine, Etienne Boursier, Eric Moulines, Michael I Jordan, Alain Durmus
TL;DR
The paper addresses welfare loss from externalities in a two-player sequential bandit, formalizing welfare as $W(a,b)=v^{up}(a)+v^{down}(a,b)$. It proposes an online Coase-like solution by introducing property rights and transfers, enabling the downstream to learn optimal transfers $\tau^{\star}_a$ and bargain to steer the upstream's action toward the welfare-maximizing outcome. The authors introduce BELGIC, a two-phase algorithm that first performs batched binary search to estimate $\tau^{\star}_a$ and then runs a Bandit-Alg on the shifted rewards $v^{down}(a,b)-\hat{\tau}_a$, achieving sublinear welfare regret and ensuring welfare efficiency when the upstream uses any no-regret policy. This work blends mechanism design with online learning, showing that transfers can align incentives even when agents learn, with potential extensions to multi-agent settings and online economic systems.
Abstract
In economic theory, the concept of externality refers to any indirect effect resulting from an interaction between players that affects the social welfare. Most of the models within which externality has been studied assume that agents have perfect knowledge of their environment and preferences. This is a major hindrance to the practical implementation of many proposed solutions. To address this issue, we consider a two-player bandit setting where the actions of one of the players affect the other player and we extend the Coase theorem [Coase, 1960]. This result shows that the optimal approach for maximizing the social welfare in the presence of externality is to establish property rights, i.e., enable transfers and bargaining between the players. Our work removes the classical assumption that bargainers possess perfect knowledge of the underlying game. We first demonstrate that in the absence of property rights, the social welfare breaks down. We then design a policy for the players which allows them to learn a bargaining strategy which maximizes the total welfare, recovering the Coase theorem under uncertainty.
