Demand Balancing in Primal-Dual Optimization for Blind Network Revenue Management
Sentao Miao, Yining Wang
TL;DR
This work addresses blind network revenue management with unknown nonparametric demand by introducing PD-NRM, a primal-dual gradient-based algorithm that updates dual variables infrequently and employs demand balancing to control primal feasibility. The method uses a two-phase gradient-estimation procedure to learn demand statistics and a balanced price to offset inventory slack, all within an epoch-based framework that reduces computational overhead. The authors prove a nearly optimal regret bound of $\tilde{O}(N^{3.25}\sqrt{T})$ (without $o(\sqrt{T})$ terms) and demonstrate practical performance improvements over benchmarks in numerical experiments. The approach offers scalable, first-order optimization for nonparametric demand learning under inventory constraints, with potential extensions to broader online resource-constrained optimization problems.
Abstract
This paper proposes a practically efficient algorithm with optimal theoretical regret which solves the classical network revenue management (NRM) problem with unknown, nonparametric demand. Over a time horizon of length $T$, in each time period the retailer needs to decide prices of $N$ types of products which are produced based on $M$ types of resources with unreplenishable initial inventory. When demand is nonparametric with some mild assumptions, Miao and Wang (2021) is the first paper which proposes an algorithm with $O(\text{poly}(N,M,\ln(T))\sqrt{T})$ type of regret (in particular, $\tilde O(N^{3.5}\sqrt{T})$ plus additional high-order terms that are $o(\sqrt{T})$ with sufficiently large $T\gg N$). In this paper, we improve the previous result by proposing a primal-dual optimization algorithm which is not only more practical, but also with an improved regret of $\tilde O(N^{3.25}\sqrt{T})$ free from additional high-order terms. A key technical contribution of the proposed algorithm is the so-called demand balancing, which pairs the primal solution (i.e., the price) in each time period with another price to offset the violation of complementary slackness on resource inventory constraints. Numerical experiments compared with several benchmark algorithms further illustrate the effectiveness of our algorithm.
