Table of Contents
Fetching ...

Online Generalized-mean Welfare Maximization: Achieving Near-Optimal Regret from Samples

Zongjun Yang, Rachitesh Kumar, Christian Kroer

TL;DR

The paper addresses online fair allocation of $T$ unit items among $n$ agents under a generalized-mean welfare objective with $p\in(-\infty,1)$. It introduces a simple pure greedy policy for stationary i.i.d. arrivals that achieves near-optimal time-averaged regret $\tilde{O}(1/T)$ without distributional knowledge, and a single-sample re-solving framework that extends the same rate to arbitrarily nonstationary arrivals by forecasting future allocations from historical data. A robust coupling-based analysis shows that, even with distribution shifts measured by the time-averaged Wasserstein distance $\mathcal{W}$, the regret degrades gracefully to $\tilde{O}(1/\sqrt{T}+\mathcal{W})$, and a general coupling construction enables fundaments for broader online convex optimization problems with non-smooth objectives. The theoretical findings are complemented by experiments on real datasets, where the proposed methods achieve fast convergence and resilience to distribution shifts, highlighting practical relevance for scalable, fair, and data-efficient online decision-making.

Abstract

We study online fair allocation of $T$ sequentially arriving items among $n$ agents with heterogeneous preferences, with the objective of maximizing generalized-mean welfare, defined as the $p$-mean of agents' time-averaged utilities, with $p\in (-\infty, 1)$. We first consider the i.i.d. arrival model and show that the pure greedy algorithm -- which myopically chooses the welfare-maximizing integral allocation -- achieves $\widetilde{O}(1/T)$ average regret. Importantly, in contrast to prior work, our algorithm does not require distributional knowledge and achieves the optimal regret rate using only the online samples. We then go beyond i.i.d. arrivals and investigate a nonstationary model with time-varying independent distributions. In the absence of additional data about the distributions, it is known that every online algorithm must suffer $Ω(1)$ average regret. We show that only a single historical sample from each distribution is sufficient to recover the optimal $\widetilde{O}(1/T)$ average regret rate, even in the face of arbitrary non-stationarity. Our algorithms are based on the re-solving paradigm: they assume that the remaining items will be the ones seen historically in those periods and solve the resulting welfare-maximization problem to determine the decision in every period. Finally, we also account for distribution shifts that may distort the fidelity of historical samples and show that the performance of our re-solving algorithms is robust to such shifts.

Online Generalized-mean Welfare Maximization: Achieving Near-Optimal Regret from Samples

TL;DR

The paper addresses online fair allocation of unit items among agents under a generalized-mean welfare objective with . It introduces a simple pure greedy policy for stationary i.i.d. arrivals that achieves near-optimal time-averaged regret without distributional knowledge, and a single-sample re-solving framework that extends the same rate to arbitrarily nonstationary arrivals by forecasting future allocations from historical data. A robust coupling-based analysis shows that, even with distribution shifts measured by the time-averaged Wasserstein distance , the regret degrades gracefully to , and a general coupling construction enables fundaments for broader online convex optimization problems with non-smooth objectives. The theoretical findings are complemented by experiments on real datasets, where the proposed methods achieve fast convergence and resilience to distribution shifts, highlighting practical relevance for scalable, fair, and data-efficient online decision-making.

Abstract

We study online fair allocation of sequentially arriving items among agents with heterogeneous preferences, with the objective of maximizing generalized-mean welfare, defined as the -mean of agents' time-averaged utilities, with . We first consider the i.i.d. arrival model and show that the pure greedy algorithm -- which myopically chooses the welfare-maximizing integral allocation -- achieves average regret. Importantly, in contrast to prior work, our algorithm does not require distributional knowledge and achieves the optimal regret rate using only the online samples. We then go beyond i.i.d. arrivals and investigate a nonstationary model with time-varying independent distributions. In the absence of additional data about the distributions, it is known that every online algorithm must suffer average regret. We show that only a single historical sample from each distribution is sufficient to recover the optimal average regret rate, even in the face of arbitrary non-stationarity. Our algorithms are based on the re-solving paradigm: they assume that the remaining items will be the ones seen historically in those periods and solve the resulting welfare-maximization problem to determine the decision in every period. Finally, we also account for distribution shifts that may distort the fidelity of historical samples and show that the performance of our re-solving algorithms is robust to such shifts.
Paper Structure (80 sections, 30 theorems, 151 equations, 4 figures, 3 algorithms)

This paper contains 80 sections, 30 theorems, 151 equations, 4 figures, 3 algorithms.

Key Result

Proposition 2.1

(hardy1952inequalitiesmoulin2004fair) A welfare function $f: \mathbb{R}_+^n \to \mathbb{R}_+$ satisfies the following $4$ axioms if and only if it is in the CES family. Moreover, $f$ satisfies the above $4$ axioms and the following principle simultaneously if and only if it is a CES welfare function with parameter $p \in (-\infty, 1)$.

Figures (4)

  • Figure 1: Different generalized-mean welfare functions, parametrized by different values of $p$.
  • Figure 2: Simulations of the greedy algorithm and the re-solving algorithm on the Instagram notification dataset (left) and the MovieLens dataset (right) with the Nash welfare objective $(p=0)$, under three inputs models: i.i.d.(top), periodic (middle), and true temporal (bottom).
  • Figure 3: Simulations of the greedy algorithm and the re-solving algorithm on the Instagram notification dataset (left) and the MovieLens dataset (right) with the $p=0$ in the objective, under three inputs models: i.i.d.(top), periodic (middle), and true temporal (bottom).
  • Figure 4: Simulations of the greedy algorithm and the re-solving algorithm on the Instagram notification dataset (left) and the MovieLens dataset (right) with the $p=0.5$ in the objective, under three inputs models: i.i.d.(top), periodic (middle), and true temporal (bottom).

Theorems & Definitions (35)

  • Definition 1
  • Proposition 2.1
  • Definition 2
  • Proposition 2.2
  • Proposition 2.3
  • Theorem 3.1
  • Lemma 3.1
  • Lemma 3.1
  • Definition 3: General Position
  • Theorem 4.1
  • ...and 25 more