Learning to Mitigate Externalities: the Coase Theorem with Hindsight Rationality

Antoine Scheid; Aymeric Capitaine; Etienne Boursier; Eric Moulines; Michael I Jordan; Alain Durmus

Learning to Mitigate Externalities: the Coase Theorem with Hindsight Rationality

Antoine Scheid, Aymeric Capitaine, Etienne Boursier, Eric Moulines, Michael I Jordan, Alain Durmus

TL;DR

The paper addresses welfare loss from externalities in a two-player sequential bandit, formalizing welfare as $W(a,b)=v^{up}(a)+v^{down}(a,b)$. It proposes an online Coase-like solution by introducing property rights and transfers, enabling the downstream to learn optimal transfers $\tau^{\star}_a$ and bargain to steer the upstream's action toward the welfare-maximizing outcome. The authors introduce BELGIC, a two-phase algorithm that first performs batched binary search to estimate $\tau^{\star}_a$ and then runs a Bandit-Alg on the shifted rewards $v^{down}(a,b)-\hat{\tau}_a$, achieving sublinear welfare regret and ensuring welfare efficiency when the upstream uses any no-regret policy. This work blends mechanism design with online learning, showing that transfers can align incentives even when agents learn, with potential extensions to multi-agent settings and online economic systems.

Abstract

In economic theory, the concept of externality refers to any indirect effect resulting from an interaction between players that affects the social welfare. Most of the models within which externality has been studied assume that agents have perfect knowledge of their environment and preferences. This is a major hindrance to the practical implementation of many proposed solutions. To address this issue, we consider a two-player bandit setting where the actions of one of the players affect the other player and we extend the Coase theorem [Coase, 1960]. This result shows that the optimal approach for maximizing the social welfare in the presence of externality is to establish property rights, i.e., enable transfers and bargaining between the players. Our work removes the classical assumption that bargainers possess perfect knowledge of the underlying game. We first demonstrate that in the absence of property rights, the social welfare breaks down. We then design a policy for the players which allows them to learn a bargaining strategy which maximizes the total welfare, recovering the Coase theorem under uncertainty.

Learning to Mitigate Externalities: the Coase Theorem with Hindsight Rationality

TL;DR

The paper addresses welfare loss from externalities in a two-player sequential bandit, formalizing welfare as

. It proposes an online Coase-like solution by introducing property rights and transfers, enabling the downstream to learn optimal transfers

and bargain to steer the upstream's action toward the welfare-maximizing outcome. The authors introduce BELGIC, a two-phase algorithm that first performs batched binary search to estimate

and then runs a Bandit-Alg on the shifted rewards

, achieving sublinear welfare regret and ensuring welfare efficiency when the upstream uses any no-regret policy. This work blends mechanism design with online learning, showing that transfers can align incentives even when agents learn, with potential extensions to multi-agent settings and online economic systems.

Abstract

Paper Structure (12 sections, 16 theorems, 123 equations, 1 figure, 3 algorithms)

This paper contains 12 sections, 16 theorems, 123 equations, 1 figure, 3 algorithms.

Introduction
Setup and Inefficiency of Externality
Bandit game
Inefficiency without property rights
Online Property Game with Bargaining Players
Online Property Game
Downstream player's procedure
Related work
Conclusion
Algorithmic Subroutine for the Binary Search
Invariance when the property rights are given to the downstream player
Proofs and Technical Results

Key Result

Theorem 3

Suppose that $\operatorname{argmax}_{a \in \mathcal{A}} v^\mathrm{up}(a)$ is the singleton $\{a^\mathrm{u}_\star\}$ and that for any $b \in \mathcal{A}$. In the absence of property rights and when the upstream player runs any no-regret policy $\Pi_{\mathrm{n}}^{\mathrm{up}}$, we have $\mathfrak{R}^\mathrm{sw}(T, \Pi_{\mathrm{n}}^{\mathrm{up}}, \Pi^{\mathrm{down}}_{\mathrm{n}}) = \Omega(T)$. There

Figures (1)

Figure 1: Empirical frequencies of the upstream player's actions when property rights are not defined (left) and when they are defined (right).

Theorems & Definitions (30)

Example 1: label=exa:cont
Example 2: continues=exa:cont
Theorem 3
Example 4: continues=exa:cont
Lemma 1
Proposition 1
Theorem 5
Corollary 1
Theorem 6
proof : Proof of \ref{['theorem:social_welfare_regret_no_incentives_long']}
...and 20 more

Learning to Mitigate Externalities: the Coase Theorem with Hindsight Rationality

TL;DR

Abstract

Learning to Mitigate Externalities: the Coase Theorem with Hindsight Rationality

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (30)