Table of Contents
Fetching ...

Joint Pricing and Resource Allocation: An Optimal Online-Learning Approach

Jianyu Xu, Xuan Wang, Yu-Xiang Wang, Jiashuo Jiang

TL;DR

The paper tackles online joint pricing and inventory allocation under price-dependent, stochastic demand with perishable stock. It introduces a two-stage stochastic programming view and a hierarchical online-learning algorithm that uses a lower-confidence-bound meta-strategy over multiple local zeroth-order OCO agents to manage non-convexity and bandit feedback. The authors prove a near-optimal regret bound of $\tilde{O}(\sqrt{Tmn})$, demonstrating a principled integration of statistical learning techniques with operations research for online pricing and allocation. The approach provides a path toward scalable, data-driven decision-making in dynamic, multi-supplier marketplaces and highlights opportunities for extending to non-linear demands, censored feedback, and fairness considerations.

Abstract

We study an online learning problem on dynamic pricing and resource allocation, where we make joint pricing and inventory decisions to maximize the overall net profit. We consider the stochastic dependence of demands on the price, which complicates the resource allocation process and introduces significant non-convexity and non-smoothness to the problem. To solve this problem, we develop an efficient algorithm that utilizes a "Lower-Confidence Bound (LCB)" meta-strategy over multiple OCO agents. Our algorithm achieves $\tilde{O}(\sqrt{Tmn})$ regret (for $m$ suppliers and $n$ consumers), which is optimal with respect to the time horizon $T$. Our results illustrate an effective integration of statistical learning methodologies with complex operations research problems.

Joint Pricing and Resource Allocation: An Optimal Online-Learning Approach

TL;DR

The paper tackles online joint pricing and inventory allocation under price-dependent, stochastic demand with perishable stock. It introduces a two-stage stochastic programming view and a hierarchical online-learning algorithm that uses a lower-confidence-bound meta-strategy over multiple local zeroth-order OCO agents to manage non-convexity and bandit feedback. The authors prove a near-optimal regret bound of , demonstrating a principled integration of statistical learning techniques with operations research for online pricing and allocation. The approach provides a path toward scalable, data-driven decision-making in dynamic, multi-supplier marketplaces and highlights opportunities for extending to non-linear demands, censored feedback, and fairness considerations.

Abstract

We study an online learning problem on dynamic pricing and resource allocation, where we make joint pricing and inventory decisions to maximize the overall net profit. We consider the stochastic dependence of demands on the price, which complicates the resource allocation process and introduces significant non-convexity and non-smoothness to the problem. To solve this problem, we develop an efficient algorithm that utilizes a "Lower-Confidence Bound (LCB)" meta-strategy over multiple OCO agents. Our algorithm achieves regret (for suppliers and consumers), which is optimal with respect to the time horizon . Our results illustrate an effective integration of statistical learning methodologies with complex operations research problems.

Paper Structure

This paper contains 33 sections, 16 theorems, 46 equations, 5 algorithms.

Key Result

Lemma 3.1

The function $g(\vec{I}, p, \vec{D})$ defined in eq:g_function is marginally convex on $\vec{I}$ and on $\vec{D}$.

Theorems & Definitions (25)

  • Lemma 3.1
  • Lemma 3.3
  • Lemma 3.4
  • Definition 3.5: Regret
  • Theorem 5.1: Regret
  • Lemma 5.2: Sub-regret of every $\mathcal{A}_K$
  • Lemma 5.3: Validity of $\Delta_K$
  • Corollary 5.4
  • Lemma 5.5
  • Lemma B.1
  • ...and 15 more