Honor Among Bandits: No-Regret Learning for Online Fair Division

Ariel D. Procaccia; Benjamin Schiffer; Shirley Zhang

Honor Among Bandits: No-Regret Learning for Online Fair Division

Ariel D. Procaccia, Benjamin Schiffer, Shirley Zhang

TL;DR

This paper tackles online fair division with unknown, type-dependent valuations by framing learning as a stochastic bandit problem under fairness constraints. It proposes an explore-then-commit algorithm that leverages a confidence-region LP to enforce envy-freeness in expectation or proportionality in expectation while achieving a near-optimal social welfare, yielding a regret of $ ilde{O}(T^{2/3})$. The authors establish matching lower bounds up to logarithmic factors, demonstrating tightness, and develop fairness-machinery that exploits structure in the fairness constraints to enable faster learning. The results advance the understanding of learning under multi-armed-bandit style constraints and have practical implications for online allocation problems where fairness matters, such as food donation and resource distribution.

Abstract

We consider the problem of online fair division of indivisible goods to players when there are a finite number of types of goods and player values are drawn from distributions with unknown means. Our goal is to maximize social welfare subject to allocating the goods fairly in expectation. When a player's value for an item is unknown at the time of allocation, we show that this problem reduces to a variant of (stochastic) multi-armed bandits, where there exists an arm for each player's value for each type of good. At each time step, we choose a distribution over arms which determines how the next item is allocated. We consider two sets of fairness constraints for this problem: envy-freeness in expectation and proportionality in expectation. Our main result is the design of an explore-then-commit algorithm that achieves $\tilde{O}(T^{2/3})$ regret while maintaining either fairness constraint. This result relies on unique properties fundamental to fair-division constraints that allow faster rates of learning, despite the restricted action space. We also prove a lower bound of $\tildeΩ(T^{2/3})$ regret for our setting, showing that our results are tight.

Honor Among Bandits: No-Regret Learning for Online Fair Division

TL;DR

. The authors establish matching lower bounds up to logarithmic factors, demonstrating tightness, and develop fairness-machinery that exploits structure in the fairness constraints to enable faster learning. The results advance the understanding of learning under multi-armed-bandit style constraints and have practical implications for online allocation problems where fairness matters, such as food donation and resource distribution.

Abstract

regret while maintaining either fairness constraint. This result relies on unique properties fundamental to fair-division constraints that allow faster rates of learning, despite the restricted action space. We also prove a lower bound of

regret for our setting, showing that our results are tight.

Paper Structure (22 sections, 31 theorems, 141 equations, 4 algorithms)

This paper contains 22 sections, 31 theorems, 141 equations, 4 algorithms.

Introduction
Our Results
Related Work
Model
Online Allocation With Unknown Values
Fairness Notions
Regret and Problem Formulation
Fairness Machinery
Algorithm and Regret Bounds
Lower bounds
Discussion
Algorithmic Representation of Model
Motivating Fairness in Expectation
Proportionality
Additional Model Notes
...and 7 more sections

Key Result

Lemma 1

The family of envy-freeness in expectation constraints satisfies Property def:fairness_to_UAR.

Theorems & Definitions (80)

Definition 1
Definition 2
Remark 1
Remark 2
Definition 3
Lemma 1
proof : Proof sketch
Lemma 2
proof : Proof sketch
Theorem 1
...and 70 more

Honor Among Bandits: No-Regret Learning for Online Fair Division

TL;DR

Abstract

Honor Among Bandits: No-Regret Learning for Online Fair Division

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (80)