Keep Everyone Happy: Online Fair Division of Numerous Items with Few Copies
Arun Verma, Indrajit Saha, Makoto Yokoo, Bryan Kian Hsiang Low
TL;DR
This work tackles online fair division when a large number of indivisible items arrive sequentially but each item has only a few copies, making it infeasible to learn utilities for all item-agent pairs. It models the problem as a contextual bandit with unknown utility function f over item-agent features and introduces a goodness function G to capture fairness–efficiency trade-offs, notably using the weighted Gini (and NSW/log-NSW as alternatives). The authors propose OFD-UCB and OFD-TS algorithms that construct optimistic utility estimates and allocate each item by maximizing G, proving sub-linear regret under linear and certain non-linear settings, and extending to a broad class of goodness functions with local monotonicity and Lipschitz properties. Empirical results on synthetic data corroborate theoretical guarantees, showing the superiority of Thompson sampling and illustrating how the fairness parameter rho governs the balance between fairness and efficiency. The approach provides a principled, scalable framework for real-world platforms that must fairly distribute scarce items while learning user-provider utilities online.
Abstract
This paper considers a novel variant of the online fair division problem involving multiple agents in which a learner sequentially observes an indivisible item that has to be irrevocably allocated to one of the agents while satisfying a fairness and efficiency constraint. Existing algorithms assume a small number of items with a sufficiently large number of copies, which ensures a good utility estimation for all item-agent pairs from noisy bandit feedback. However, this assumption may not hold in many real-life applications, for example, an online platform that has a large number of users (items) who use the platform's service providers (agents) only a few times (a few copies of items), which makes it difficult to accurately estimate utilities for all item-agent pairs. To address this, we assume utility is an unknown function of item-agent features. We then propose algorithms that model online fair division as a contextual bandit problem, with sub-linear regret guarantees. Our experimental results further validate the effectiveness of the proposed algorithms.
