Online Generalized-mean Welfare Maximization: Achieving Near-Optimal Regret from Samples
Zongjun Yang, Rachitesh Kumar, Christian Kroer
TL;DR
The paper addresses online fair allocation of $T$ unit items among $n$ agents under a generalized-mean welfare objective with $p\in(-\infty,1)$. It introduces a simple pure greedy policy for stationary i.i.d. arrivals that achieves near-optimal time-averaged regret $\tilde{O}(1/T)$ without distributional knowledge, and a single-sample re-solving framework that extends the same rate to arbitrarily nonstationary arrivals by forecasting future allocations from historical data. A robust coupling-based analysis shows that, even with distribution shifts measured by the time-averaged Wasserstein distance $\mathcal{W}$, the regret degrades gracefully to $\tilde{O}(1/\sqrt{T}+\mathcal{W})$, and a general coupling construction enables fundaments for broader online convex optimization problems with non-smooth objectives. The theoretical findings are complemented by experiments on real datasets, where the proposed methods achieve fast convergence and resilience to distribution shifts, highlighting practical relevance for scalable, fair, and data-efficient online decision-making.
Abstract
We study online fair allocation of $T$ sequentially arriving items among $n$ agents with heterogeneous preferences, with the objective of maximizing generalized-mean welfare, defined as the $p$-mean of agents' time-averaged utilities, with $p\in (-\infty, 1)$. We first consider the i.i.d. arrival model and show that the pure greedy algorithm -- which myopically chooses the welfare-maximizing integral allocation -- achieves $\widetilde{O}(1/T)$ average regret. Importantly, in contrast to prior work, our algorithm does not require distributional knowledge and achieves the optimal regret rate using only the online samples. We then go beyond i.i.d. arrivals and investigate a nonstationary model with time-varying independent distributions. In the absence of additional data about the distributions, it is known that every online algorithm must suffer $Ω(1)$ average regret. We show that only a single historical sample from each distribution is sufficient to recover the optimal $\widetilde{O}(1/T)$ average regret rate, even in the face of arbitrary non-stationarity. Our algorithms are based on the re-solving paradigm: they assume that the remaining items will be the ones seen historically in those periods and solve the resulting welfare-maximization problem to determine the decision in every period. Finally, we also account for distribution shifts that may distort the fidelity of historical samples and show that the performance of our re-solving algorithms is robust to such shifts.
