Table of Contents
Fetching ...

Keep Everyone Happy: Online Fair Division of Numerous Items with Few Copies

Arun Verma, Indrajit Saha, Makoto Yokoo, Bryan Kian Hsiang Low

TL;DR

This work tackles online fair division when a large number of indivisible items arrive sequentially but each item has only a few copies, making it infeasible to learn utilities for all item-agent pairs. It models the problem as a contextual bandit with unknown utility function f over item-agent features and introduces a goodness function G to capture fairness–efficiency trade-offs, notably using the weighted Gini (and NSW/log-NSW as alternatives). The authors propose OFD-UCB and OFD-TS algorithms that construct optimistic utility estimates and allocate each item by maximizing G, proving sub-linear regret under linear and certain non-linear settings, and extending to a broad class of goodness functions with local monotonicity and Lipschitz properties. Empirical results on synthetic data corroborate theoretical guarantees, showing the superiority of Thompson sampling and illustrating how the fairness parameter rho governs the balance between fairness and efficiency. The approach provides a principled, scalable framework for real-world platforms that must fairly distribute scarce items while learning user-provider utilities online.

Abstract

This paper considers a novel variant of the online fair division problem involving multiple agents in which a learner sequentially observes an indivisible item that has to be irrevocably allocated to one of the agents while satisfying a fairness and efficiency constraint. Existing algorithms assume a small number of items with a sufficiently large number of copies, which ensures a good utility estimation for all item-agent pairs from noisy bandit feedback. However, this assumption may not hold in many real-life applications, for example, an online platform that has a large number of users (items) who use the platform's service providers (agents) only a few times (a few copies of items), which makes it difficult to accurately estimate utilities for all item-agent pairs. To address this, we assume utility is an unknown function of item-agent features. We then propose algorithms that model online fair division as a contextual bandit problem, with sub-linear regret guarantees. Our experimental results further validate the effectiveness of the proposed algorithms.

Keep Everyone Happy: Online Fair Division of Numerous Items with Few Copies

TL;DR

This work tackles online fair division when a large number of indivisible items arrive sequentially but each item has only a few copies, making it infeasible to learn utilities for all item-agent pairs. It models the problem as a contextual bandit with unknown utility function f over item-agent features and introduces a goodness function G to capture fairness–efficiency trade-offs, notably using the weighted Gini (and NSW/log-NSW as alternatives). The authors propose OFD-UCB and OFD-TS algorithms that construct optimistic utility estimates and allocate each item by maximizing G, proving sub-linear regret under linear and certain non-linear settings, and extending to a broad class of goodness functions with local monotonicity and Lipschitz properties. Empirical results on synthetic data corroborate theoretical guarantees, showing the superiority of Thompson sampling and illustrating how the fairness parameter rho governs the balance between fairness and efficiency. The approach provides a principled, scalable framework for real-world platforms that must fairly distribute scarce items while learning user-provider utilities online.

Abstract

This paper considers a novel variant of the online fair division problem involving multiple agents in which a learner sequentially observes an indivisible item that has to be irrevocably allocated to one of the agents while satisfying a fairness and efficiency constraint. Existing algorithms assume a small number of items with a sufficiently large number of copies, which ensures a good utility estimation for all item-agent pairs from noisy bandit feedback. However, this assumption may not hold in many real-life applications, for example, an online platform that has a large number of users (items) who use the platform's service providers (agents) only a few times (a few copies of items), which makes it difficult to accurately estimate utilities for all item-agent pairs. To address this, we assume utility is an unknown function of item-agent features. We then propose algorithms that model online fair division as a contextual bandit problem, with sub-linear regret guarantees. Our experimental results further validate the effectiveness of the proposed algorithms.
Paper Structure (22 sections, 10 theorems, 30 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 22 sections, 10 theorems, 30 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Let $\delta \in (0,1)$, $\lambda>0$, noise in utility be the $R$-sub-Gaussian, and the goodness function be same as defined in eq:goodness_function with $w_{\max} = \max_{n \in \mathcal{N}}{w_n}$. Then, with a probability of at least $1-\delta$, the regret in $T > 0$ rounds is where $\alpha_T = R\sqrt{d\log\left( \frac{1+ \left({TL^2}/{\lambda}\right)}{\delta}\right)} + \lambda^{\frac{1}{2}}S$, $

Figures (5)

  • Figure 1: Example of an online fair division of numerous items with few copies: An online platform recommends a service provider (agent) to sequentially arriving users (items). The platform must balance two conflicting objectives: fairly recommending service providers to address their competing interests (fairness) and maximizing its own profit (efficiency).
  • Figure 2: (\ref{['fig:compare_linear2']}-\ref{['fig:compare_linear10']}): Comparing cumulative regret of our proposed online fair division algorithms with baseline algorithms. (\ref{['fig:copies_ucb']} & \ref{['fig:copies_ts']}): Cumulative regret of OFD-UCB and OFD-TS vs. different number of copies for each item $(c)$. (\ref{['fig:rho_gini']}-\ref{['fig:rho_total_utility']}): Fairness and efficiency measures (ratio of minimum utility to total utility, Gini coefficient, and total utility) as a function of control parameter $(\rho)$.
  • Figure 3: Cumulative regret of OFD-UCB and OFD-TS vs. different values of $N$ and $d$.
  • Figure 4: Cumulative regret of OFD-UCB and OFD-TS vs. different values of $N$ and $d$ for $\rho =1.0$.
  • Figure 5: Cumulative regret of OFD-UCB and OFD-TS vs. different values $d$ for $\rho =0.85$.

Theorems & Definitions (21)

  • Remark 1
  • Theorem 1
  • Definition 1: OFD Compatible Contextual Bandit Algorithm
  • Theorem 2
  • Definition 2: Locally monotonically non-decreasing and Lipschitz continuous function
  • Theorem 3
  • Lemma 1: Theorem 2 of NIPS11_abbasi2011improved
  • Lemma 2: Lemma 1 of MSS81_weymark1981generalized
  • Lemma 3: Lemma 1 of sim2021collaborative
  • Lemma 4
  • ...and 11 more