Beyond $\mathcal{O}(\sqrt{T})$ Regret: Decoupling Learning and Decision-making in Online Linear Programming

Wenzhi Gao; Dongdong Ge; Chenyu Xue; Chunlin Sun; Yinyu Ye

Beyond $\mathcal{O}(\sqrt{T})$ Regret: Decoupling Learning and Decision-making in Online Linear Programming

Wenzhi Gao, Dongdong Ge, Chenyu Xue, Chunlin Sun, Yinyu Ye

TL;DR

The paper addresses online linear programming under stochastic inputs and proposes a dual-error-bound framework that enables first-order methods to achieve sub-$\sqrt{T}$ regret. By introducing an exploration-exploitation strategy and decoupling learning from decision-making, the authors derive sublinear regret bounds that interpolate between continuous and finite support settings via a Hölder-growth parameter $\gamma$. Key contributions include $o(\sqrt{T})$ regret in continuous support, $O(\log T)$ regret in finite support, and a general bound $O(T^{(\gamma-1)/(2\gamma-1)} \log T)$ for $\gamma$-Hölder growth, along with a practical algorithm that learns a dual-optimal neighborhood and localizes decisions. The framework significantly broadens the applicability of first-order OLP methods, offering computationally efficient alternatives to LP-based online algorithms while delivering strong performance guarantees. The results have practical impact for revenue management and resource allocation where fast, scalable, and provably effective online decisions are essential.

Abstract

Online linear programming plays an important role in both revenue management and resource allocation, and recent research has focused on developing efficient first-order online learning algorithms. Despite the empirical success of first-order methods, they typically achieve a regret no better than $\mathcal{O} ( \sqrt{T} )$, which is suboptimal compared to the $\mathcal{O} (\log T)$ bound guaranteed by the state-of-the-art linear programming (LP)-based online algorithms. This paper establishes a general framework that improves upon the $\mathcal{O} ( \sqrt{T} )$ result when the LP dual problem exhibits certain error bound conditions. For the first time, we show that first-order learning algorithms achieve $o( \sqrt{T} )$ regret in the continuous support setting and $\mathcal{O} (\log T)$ regret in the finite support setting beyond the non-degeneracy assumption. Our results significantly improve the state-of-the-art regret results and provide new insights for sequential decision-making.

Beyond $\mathcal{O}(\sqrt{T})$ Regret: Decoupling Learning and Decision-making in Online Linear Programming

TL;DR

The paper addresses online linear programming under stochastic inputs and proposes a dual-error-bound framework that enables first-order methods to achieve sub-

regret. By introducing an exploration-exploitation strategy and decoupling learning from decision-making, the authors derive sublinear regret bounds that interpolate between continuous and finite support settings via a Hölder-growth parameter

. Key contributions include

regret in continuous support,

regret in finite support, and a general bound

for

-Hölder growth, along with a practical algorithm that learns a dual-optimal neighborhood and localizes decisions. The framework significantly broadens the applicability of first-order OLP methods, offering computationally efficient alternatives to LP-based online algorithms while delivering strong performance guarantees. The results have practical impact for revenue management and resource allocation where fast, scalable, and provably effective online decisions are essential.

Abstract

, which is suboptimal compared to the

bound guaranteed by the state-of-the-art linear programming (LP)-based online algorithms. This paper establishes a general framework that improves upon the

result when the LP dual problem exhibits certain error bound conditions. For the first time, we show that first-order learning algorithms achieve

regret in the continuous support setting and

regret in the finite support setting beyond the non-degeneracy assumption. Our results significantly improve the state-of-the-art regret results and provide new insights for sequential decision-making.

Paper Structure (46 sections, 24 theorems, 141 equations, 3 figures, 3 tables, 5 algorithms)

This paper contains 46 sections, 24 theorems, 141 equations, 3 figures, 3 tables, 5 algorithms.

Introduction
Contributions
Related Literature.
LP-based OLP Algorithms.
First-order OLP Algorithms.
Structure of the paper
Online linear programming with first-order methods
Notations.
OLP and duality
First-order methods on the dual problem
Performance metric
Main assumptions and summary of the results
Dual error bound and subgradient method
Dual error bound condition
Consequences of the dual error bound
...and 31 more sections

Key Result

Theorem 2.1

Under A1 to A3, online subgradient method eqn:osgm with $\alpha_t \equiv \sqrt{\frac{2 \bar{c}}{m \underline{d} ( \bar{a} + \bar{d} )^2}} \cdot \tfrac{1}{\sqrt{T}}$ outputs $\hat{\mathbf{x}}_T$ such that

Figures (3)

Figure 1: Exploration phase sends $\mathbf{y}^{T_e + 1}$ into a neighborhood of $\mathcal{Y}^\star$, and in the exploitation phase, $\{\mathbf{y}^t\}_{T_e+1}^T$ localizes in this neighborhood with adaptivity to make adjustments.
Figure 2: Growth of normalized $r(\hat{\mathbf{x}}_T)+v(\hat{\mathbf{x}}_T)$ of different algorithms under the continuous distributions.
Figure 3: Growth of normalized $r(\hat{\mathbf{x}}_T)+v(\hat{\mathbf{x}}_T)$ of different algorithms under the finite distributions.

Theorems & Definitions (34)

Theorem 2.1: Sublinear regret benchmark gao2023solving10.48550/arxiv.2003.02513
Theorem 2.2: \ref{['thm:final']}, informal
Remark 3.1
Example 3.1: Continuous-support, non-degeneracy li2022onlinebray2019logarithmicma2024optimal
Example 3.2: Finite-support, non-degeneracy 10.48550/arxiv.2101.11092
Example 3.3: General growth
Lemma 3.1: Efficient learning algorithm
Lemma 3.2: Noise ball and last iterate convergence
Lemma 3.3: Dual convergence
Lemma 4.1: Regret
...and 24 more

Beyond $\mathcal{O}(\sqrt{T})$ Regret: Decoupling Learning and Decision-making in Online Linear Programming

TL;DR

Abstract

Beyond $\mathcal{O}(\sqrt{T})$ Regret: Decoupling Learning and Decision-making in Online Linear Programming

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (34)