Infrequent Resolving Algorithm for Online Linear Programming
Guokai Li, Zizhuo Wang, Jingwei Zhang
TL;DR
This work tackles online linear programming with unknown finite-support arrivals by introducing the Argmax with Infrequent Resolving (AIR) policy, which strategically resolves the fluid LP at a vanishingly small number of time points while performing first-order updates in-between. AIR achieves a constant regret $\mathcal{O}(1)$ by designing a resolving schedule with $\mathcal{O}(\log\log T)$ LP solves, and extends to finite resolving with $M$ solves yielding $\mathcal{O}\left(T^{(1/2+\epsilon)^{M-1}}\right)$ regret. A variant for known arrival probabilities, AIR-KP, attains $\mathcal{O}(1)$ regret with $\mathcal{O}(\log\log T)$ solves and $\mathcal{O}\left(T^{(1/2+\epsilon)^M}\right)$ regret with $M$ solves. Empirical results corroborate the theoretical gains, showing AIR’s strong performance and substantial computational savings relative to fully LP-based or LP-free baselines, even under nonstationary or Markov-modulated arrivals.
Abstract
Online linear programming (OLP) has gained significant attention from both researchers and practitioners due to its extensive applications, such as online auction, network revenue management, order fulfillment and advertising. Existing OLP algorithms fall into two categories: LP-based algorithms and LP-free algorithms. The former one typically guarantees better performance but requires solving a large number of LPs, which could be computationally expensive. In contrast, LP-free algorithm only requires first-order computations but induces a worse performance. In this work, we bridge the gap between these two extremes by proposing a well-performing algorithm, that solves LPs at a few selected time points and conducts first-order computations at other time points. Specifically, for the case where the inputs are drawn from an unknown finite-support distribution, the proposed algorithm achieves a constant regret (even for the hard "degenerate" case) while solving LPs only O(log log T) times over the time horizon T. Moreover, when we are allowed to solve LPs only M times, we design the corresponding schedule such that the proposed algorithm can guarantee a nearly O(T^((1/2)^(M-1)) regret. Our work highlights the value of resolving both at the beginning and the end of the selling horizon, and provides a novel framework to prove the performance guarantee of the proposed policy under different infrequent resolving schedules. Numerical experiments are conducted to demonstrate the efficiency of the proposed algorithms.
