Table of Contents
Fetching ...

Online Resource Allocation with Non-Stationary Customers

Xiaoyue Zhang, Hanzhang Qin, Mabel C. Chou

TL;DR

The paper tackles online resource allocation under non-stationary customer arrivals with unknown click-through rates. It introduces the Unified Learning-while-Earning (ULwE) algorithm within the Contextual Bandits with Knapsacks framework to adapt in real time by switching between LP-based (near-stationary) and adversarial (nonstationary) strategies while learning CTR parameters in a Bayesian-like parametric model. The authors prove a sublinear regret bound in near-IID settings and a constant competitive ratio under general non-stationary arrivals, anchored by a deterministic LP upper bound on the optimal revenue, and validate the approach with extensive simulations across near-IID, adversarial, and general arrivals. The work advances online resource allocation by coupling online learning of CTRs with inventory-aware decision rules, enabling robust performance in highly dynamic environments with practical implications for online advertising and service systems.

Abstract

We propose a novel algorithm for online resource allocation with non-stationary customer arrivals and unknown click-through rates. We assume multiple types of customers arrive in a nonstationary stochastic fashion, with unknown arrival rates in each period, and that customers' click-through rates are unknown and can only be learned online. By leveraging results from the stochastic contextual bandit with knapsack and online matching with adversarial arrivals, we develop an online scheme to allocate the resources to nonstationary customers. We prove that under mild conditions, our scheme achieves a ``best-of-both-world'' result: the scheme has a sublinear regret when the customer arrivals are near-stationary, and enjoys an optimal competitive ratio under general (non-stationary) customer arrival distributions. Finally, we conduct extensive numerical experiments to show our approach generates near-optimal revenues for all different customer scenarios.

Online Resource Allocation with Non-Stationary Customers

TL;DR

The paper tackles online resource allocation under non-stationary customer arrivals with unknown click-through rates. It introduces the Unified Learning-while-Earning (ULwE) algorithm within the Contextual Bandits with Knapsacks framework to adapt in real time by switching between LP-based (near-stationary) and adversarial (nonstationary) strategies while learning CTR parameters in a Bayesian-like parametric model. The authors prove a sublinear regret bound in near-IID settings and a constant competitive ratio under general non-stationary arrivals, anchored by a deterministic LP upper bound on the optimal revenue, and validate the approach with extensive simulations across near-IID, adversarial, and general arrivals. The work advances online resource allocation by coupling online learning of CTRs with inventory-aware decision rules, enabling robust performance in highly dynamic environments with practical implications for online advertising and service systems.

Abstract

We propose a novel algorithm for online resource allocation with non-stationary customer arrivals and unknown click-through rates. We assume multiple types of customers arrive in a nonstationary stochastic fashion, with unknown arrival rates in each period, and that customers' click-through rates are unknown and can only be learned online. By leveraging results from the stochastic contextual bandit with knapsack and online matching with adversarial arrivals, we develop an online scheme to allocate the resources to nonstationary customers. We prove that under mild conditions, our scheme achieves a ``best-of-both-world'' result: the scheme has a sublinear regret when the customer arrivals are near-stationary, and enjoys an optimal competitive ratio under general (non-stationary) customer arrival distributions. Finally, we conduct extensive numerical experiments to show our approach generates near-optimal revenues for all different customer scenarios.
Paper Structure (32 sections, 24 theorems, 99 equations, 4 figures, 3 tables, 3 algorithms)

This paper contains 32 sections, 24 theorems, 99 equations, 4 figures, 3 tables, 3 algorithms.

Key Result

Theorem 3.1

In any case of nonstationary arrivals, the algorithm guarantees When the arrivals are stationary, the algorithm guarantees

Figures (4)

  • Figure 1: Problem categories
  • Figure 2: Regret over Time under IID Arrival
  • Figure 3: Regret over Time under Adversarial Arrival (ADV1)
  • Figure 4: Regret over Time under Adversarial Arrival (ADV2)

Theorems & Definitions (40)

  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Proposition 2.1
  • proof
  • Lemma 3.1
  • Lemma 3.2
  • proof
  • Corollary 3.3
  • Lemma 3.4
  • ...and 30 more