Table of Contents
Fetching ...

Wait-Less Offline Tuning and Re-solving for Online Decision Making

Jingruo Sun, Wenzhi Gao, Ellen Vitercik, Yinyu Ye

TL;DR

The paper addresses online linear programming for dynamic resource allocation, where decisions must be made in real time while resources are limited. It proposes a hybrid, parallel multi-phase framework that re-solves LP subproblems at a frequency $f$ and runs a parallel first-order method between solves, yielding a regret bound of $O\left(\log\left(\tfrac{T}{f}\right) + \sqrt{f}\right)$ that interpolates between LP-based and first-order methods. Theoretical analysis introduces a unified performance metric and a spectrum theorem, and experiments show substantial regret reductions (over 10x) and dramatic runtime savings (over 100x) compared to baselines, with wait-less online decisions throughout the horizon. The approach has practical impact for large-scale, time-sensitive decision tasks and offers a tractable way to balance decision quality with computational constraints, along with extensions such as enhanced multi-start strategies and optimal re-solving frequency under resource limits.

Abstract

Online linear programming (OLP) has found broad applications in revenue management and resource allocation. State-of-the-art OLP algorithms achieve low regret by repeatedly solving linear programming (LP) subproblems that incorporate updated resource information. However, LP-based methods are computationally expensive and often inefficient for large-scale applications. In contrast, recent first-order OLP algorithms are more computationally efficient but typically suffer from worse regret guarantees. To address these shortcomings, we propose a new algorithm that combines the strengths of LP-based and first-order OLP methods. The algorithm re-solves the LP subproblems periodically at a predefined frequency $f$ and uses the latest dual prices to guide online decision-making. In addition, a first-order method runs in parallel during each interval between LP re-solves, smoothing resource consumption. Our algorithm achieves $\mathscr{O}(\log (T/f) + \sqrt{f})$ regret, delivering a "wait-less" online decision-making process that balances the computational efficiency of first-order methods and the superior regret guarantee of LP-based methods.

Wait-Less Offline Tuning and Re-solving for Online Decision Making

TL;DR

The paper addresses online linear programming for dynamic resource allocation, where decisions must be made in real time while resources are limited. It proposes a hybrid, parallel multi-phase framework that re-solves LP subproblems at a frequency and runs a parallel first-order method between solves, yielding a regret bound of that interpolates between LP-based and first-order methods. Theoretical analysis introduces a unified performance metric and a spectrum theorem, and experiments show substantial regret reductions (over 10x) and dramatic runtime savings (over 100x) compared to baselines, with wait-less online decisions throughout the horizon. The approach has practical impact for large-scale, time-sensitive decision tasks and offers a tractable way to balance decision quality with computational constraints, along with extensions such as enhanced multi-start strategies and optimal re-solving frequency under resource limits.

Abstract

Online linear programming (OLP) has found broad applications in revenue management and resource allocation. State-of-the-art OLP algorithms achieve low regret by repeatedly solving linear programming (LP) subproblems that incorporate updated resource information. However, LP-based methods are computationally expensive and often inefficient for large-scale applications. In contrast, recent first-order OLP algorithms are more computationally efficient but typically suffer from worse regret guarantees. To address these shortcomings, we propose a new algorithm that combines the strengths of LP-based and first-order OLP methods. The algorithm re-solves the LP subproblems periodically at a predefined frequency and uses the latest dual prices to guide online decision-making. In addition, a first-order method runs in parallel during each interval between LP re-solves, smoothing resource consumption. Our algorithm achieves regret, delivering a "wait-less" online decision-making process that balances the computational efficiency of first-order methods and the superior regret guarantee of LP-based methods.

Paper Structure

This paper contains 62 sections, 22 theorems, 104 equations, 8 figures, 4 tables, 5 algorithms.

Key Result

Theorem 3.1

Under Assumptions ass:1 and ass:2, the performance $\Delta_T$ of Algorithm alg:main-1 satisfies where $(\cdot)^{B+}$ indicates the projection of binding terms onto the positive orthant.

Figures (8)

  • Figure 1: Algorithm \ref{['alg:main-1']} illustration of parallel paths and the interactions between online learning and decision-making. Decisions are generated based on 1) the LP-based method (blue) with frequency $f$, 2) the first-order method (red) for the initial and final phases (with a warm start), and 3) employing the latest dual price (yellow) during intermediate phases.
  • Figure 2: Algorithm \ref{['alg:main-2']} illustration of parallel paths with multi-time restart. Decisions are generated based on 1) the LP-based method (blue) with frequency $f$ and 2) the first-order method (red) during re-solving intervals with a warm start.
  • Figure 3: Algorithms comparison.
  • Figure 4: Evaluations of \ref{['alg:main-1']} and \ref{['alg:main-2']} across various horizons, re-solving frequencies, and stochastic inputs, validating the positive relationship between regret and frequency stated in \ref{['thm:spectrum']}.
  • Figure 5: Algorithms under New Distribution.
  • ...and 3 more figures

Theorems & Definitions (25)

  • Theorem 3.1: Performance Metric
  • Theorem 3.2
  • Remark 3.3: Spectrum Theorem
  • Remark 3.4: Warm Start
  • Remark 3.5: Learning Rate Selection
  • Proposition 3.6: Optimal Re-solving Frequency
  • Lemma 1.1: Dimension Stability
  • Lemma 1.2: Quadratic Regularity, Proposition 2 in li2022online
  • Lemma 1.3: Boundedness of LP result
  • Lemma 1.4: Dual Convergence of LP-based algorithm
  • ...and 15 more