Table of Contents
Fetching ...

Online Linear Programming with Batching

Haoran Xu, Peter W. Glynn, Yinyu Ye

TL;DR

This work studies Online Linear Programming with batching, where the horizon is partitioned into $K$ batches and decisions for arrivals within a batch can be deferred to the batch end. Under continuous reward distributions, the authors develop Action-History-Dependent Learning-based algorithms that solve only one LP per batch and prove regret bounds that scale as $O( ext{log} K)$, with matching $ obreakspace ext{Ω}( ext{log} K)$ lower bounds in the known-total-customer single-resource setting; they extend these results to multiple resources and Poisson arrivals. The paper also incorporates customer impatience, analyzes its impact on the regret, and provides a method to select batch size to balance delays and opportunity losses. Through extensive numerical experiments, the results corroborate the theoretical $O( ext{log} K)$ regret and demonstrate the practical benefits of batching, including uniform regret bounds in the horizon length when $K$ is fixed. The contributions advance understanding of batching as a tool to reduce regret and computation in online resource allocation, with actionable guidelines for batch sizing and deployment under uncertainty and impatience.

Abstract

We study Online Linear Programming (OLP) with batching. The planning horizon is cut into $K$ batches, and the decisions on customers arriving within a batch can be delayed to the end of their associated batch. Compared with OLP without batching, the ability to delay decisions brings better operational performance, as measured by regret. Two research questions of interest are: (1) What is a lower bound of the regret as a function of $K$? (2) What algorithms can achieve the regret lower bound? These questions have been analyzed in the literature when the distribution of the reward and the resource consumption of the customers have finite support. By contrast, this paper analyzes these questions when the conditional distribution of the reward given the resource consumption is continuous, and we show the answers are different under this setting. When there is only a single type of resource and the decision maker knows the total number of customers, we propose an algorithm with a $O(\log K)$ regret upper bound and provide a $Ω(\log K)$ regret lower bound. We also propose algorithms with $O(\log K)$ regret upper bound for the setting in which there are multiple types of resource and the setting in which customers arrive following a Poisson process. All these regret upper and lower bounds are independent of the length of the planning horizon, and all the proposed algorithms delay decisions on customers arriving in only the first and the last batch. We also take customer impatience into consideration and establish a way of selecting an appropriate batch size.

Online Linear Programming with Batching

TL;DR

This work studies Online Linear Programming with batching, where the horizon is partitioned into batches and decisions for arrivals within a batch can be deferred to the batch end. Under continuous reward distributions, the authors develop Action-History-Dependent Learning-based algorithms that solve only one LP per batch and prove regret bounds that scale as , with matching lower bounds in the known-total-customer single-resource setting; they extend these results to multiple resources and Poisson arrivals. The paper also incorporates customer impatience, analyzes its impact on the regret, and provides a method to select batch size to balance delays and opportunity losses. Through extensive numerical experiments, the results corroborate the theoretical regret and demonstrate the practical benefits of batching, including uniform regret bounds in the horizon length when is fixed. The contributions advance understanding of batching as a tool to reduce regret and computation in online resource allocation, with actionable guidelines for batch sizing and deployment under uncertainty and impatience.

Abstract

We study Online Linear Programming (OLP) with batching. The planning horizon is cut into batches, and the decisions on customers arriving within a batch can be delayed to the end of their associated batch. Compared with OLP without batching, the ability to delay decisions brings better operational performance, as measured by regret. Two research questions of interest are: (1) What is a lower bound of the regret as a function of ? (2) What algorithms can achieve the regret lower bound? These questions have been analyzed in the literature when the distribution of the reward and the resource consumption of the customers have finite support. By contrast, this paper analyzes these questions when the conditional distribution of the reward given the resource consumption is continuous, and we show the answers are different under this setting. When there is only a single type of resource and the decision maker knows the total number of customers, we propose an algorithm with a regret upper bound and provide a regret lower bound. We also propose algorithms with regret upper bound for the setting in which there are multiple types of resource and the setting in which customers arrive following a Poisson process. All these regret upper and lower bounds are independent of the length of the planning horizon, and all the proposed algorithms delay decisions on customers arriving in only the first and the last batch. We also take customer impatience into consideration and establish a way of selecting an appropriate batch size.
Paper Structure (23 sections, 18 theorems, 427 equations, 5 figures, 4 tables)

This paper contains 23 sections, 18 theorems, 427 equations, 5 figures, 4 tables.

Key Result

Lemma 1

(a) (Proposition 1 of li2019olp) For all $d\in \otimes_{i=1}^m(\underline{d},\bar{d})$ and positive integers $N>m$, where $e\in\mathbb{R}^m$ is the vector with all components being 1. (b) (Lemma 1 of Bray2019, Lemma 2 of Bray2019, Lemma 12 of li2019olp) There exists a neighborhood $\Omega_d$ of $d_0$ such that $\Omega_d\subseteq\otimes_{i=1}^m(\underline{d},\bar{d})$, and for all $d\in\Omega_d$,

Figures (5)

  • Figure 1: Regret of Algorithm \ref{['alg: AhdLA']} and Algorithm\ref{['alg: AhdLAMulti']}: Impact of $n$
  • Figure 2: Regret of Algorithm \ref{['alg: AhdLA']} and Algorithm \ref{['alg: AhdLAMulti']}: Impact of $K$
  • Figure 3: Regret of Algorithm \ref{['alg: RaAhdLA']}: Impact of $\lambda$ and $T$
  • Figure 4: Regret of Algorithm \ref{['alg: RaAhdLA']}: Impact of $K$
  • Figure 5: Regret with Poisson Process and Exponential Customer Impatience

Theorems & Definitions (31)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Proposition 1
  • proof
  • ...and 21 more