Localized exploration in contextual dynamic pricing achieves dimension-free regret

Jinhang Chai; Yaqi Duan; Jianqing Fan; Kaizheng Wang

Localized exploration in contextual dynamic pricing achieves dimension-free regret

Jinhang Chai, Yaqi Duan, Jianqing Fan, Kaizheng Wang

TL;DR

This work tackles contextual dynamic pricing with a contextual linear demand model and introduces a three-stage LetC algorithm that combines burn-in exploration, localized refinement, and committing to a learned policy. A novel critical inequality governs the exploration-exploitation trade-off, enabling dimension-free regret bounds in the large-$T$ regime and a complete non-asymptotic bound for all horizons; a minimax lower bound confirms optimality. The analysis reveals a deep link between localized exploration and ridge-regression-type regularization, and demonstrates robust performance through extensive synthetic and real-data experiments. The results offer a principled route to dimension-robust online pricing in high-dimensional contexts with practical implications for online marketplaces.

Abstract

We study the problem of contextual dynamic pricing with a linear demand model. We propose a novel localized exploration-then-commit (LetC) algorithm which starts with a pure exploration stage, followed by a refinement stage that explores near the learned optimal pricing policy, and finally enters a pure exploitation stage. The algorithm is shown to achieve a minimax optimal, dimension-free regret bound when the time horizon exceeds a polynomial of the covariate dimension. Furthermore, we provide a general theoretical framework that encompasses the entire time spectrum, demonstrating how to balance exploration and exploitation when the horizon is limited. The analysis is powered by a novel critical inequality that depicts the exploration-exploitation trade-off in dynamic pricing, mirroring its existing counterpart for the bias-variance trade-off in regularized regression. Our theoretical results are validated by extensive experiments on synthetic and real-world data.

Localized exploration in contextual dynamic pricing achieves dimension-free regret

TL;DR

regime and a complete non-asymptotic bound for all horizons; a minimax lower bound confirms optimality. The analysis reveals a deep link between localized exploration and ridge-regression-type regularization, and demonstrates robust performance through extensive synthetic and real-data experiments. The results offer a principled route to dimension-robust online pricing in high-dimensional contexts with practical implications for online marketplaces.

Abstract

Paper Structure (89 sections, 10 theorems, 195 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 89 sections, 10 theorems, 195 equations, 6 figures, 1 table, 1 algorithm.

Introduction
Our contributions
Related works
Paper organization
Notation
Problem set-up
Contextual linear demand model:
Optimal pricing policy:
Online learning process:
A localized explore-then-commit algorithm
Stage 1: Burn-in exploration.
Stage 2: Localized exploration.
Stage 3: Committing.
Doubling trick to make the algorithm fully online:
Further refining the algorithm by using time-varying perturbation $\eta$:
...and 74 more sections

Key Result

Lemma 1

The second-moment matrix ${\boldsymbol{S}^{\star}} \in \mathds{S}^{2 d}$ has the following property: Furthermore, under Assumptions asp:basic-regularity and asp:exploration, we have

Figures (6)

Figure 1: Illustrations of the structure of the critical inequality \ref{['eq:CI']}. (a) The left plot shows the singularity function $\eta \mapsto \mathcal{S}(\eta)$ on the left-hand side and the function $\eta \mapsto \hbox{SNR}^{-1} \eta^{-2}$ on the right-hand side. The critical point $\eta^{\star}$ is identified at the intersection of these two curves, marked with a star. (b) The right plot illustrates the impact of varying the signal-to-noise ratio $\hbox{SNR}$ on the right-hand side curve. As $\hbox{SNR}$ decreases, indicating a more challenging problem, the curve shifts upwards, causing the critical value $\eta^{\star}$ to move to the right, corresponding to larger values of solutions. On the $y$-axis of both plots, $\tilde{d}^*$ stands for the degenerate dimension \ref{['eq:EffDim']} at $\eta=\eta^{\star}$.
Figure 2: (Log-log) plot of the regret upper bound versus the time horizon $T$, omitting constant and logarithmic factors. (i) Initially, when $T = \widetilde{{O}}(d)$, the regret grows linearly. (ii) At the first transition point $T = {\widetilde{\Theta}}(d)$, it shifts smoothly into the square-root regime $\widetilde{{O}}(\sqrt{d \, T})$. (iii) In the intermediate range $T = \widetilde{\Omega}(d)$ and $T = \widetilde{{O}}(d^4)$, the regret scales as ${O}(\sqrt{\widetilde{d} \, T})$, with $\widetilde{d}$ decreasing from $d$ to $1$. (iv) Beyond the second transition point at $T = {\widetilde{\Theta}}(d^4)$, the regret becomes dimension-free, scaling as $\widetilde{{O}}(\sqrt{T})$.
Figure 3: Regret of the LetC algorithm. Panel (a) depicts the LetC regret versus the time $T$ in the original scale, while Panel (b) displays the LetC regret versus the time $T$ in the log-log scale. For both panels, the $x$-axis is the time $T$, and the $y$-axis is the regret up to time $T$. Different curves stand for different dimensions. The correspondence between the dimension and the color of the curve is as follows, $d=4$ is 'Blue', $d=8$ is 'Orange', $d=16$ is 'Green', $d=32$ is 'Red', $d=64$ is 'Purple'. It is evident that for $T$ large enough, the regret depends on the square-root manner of the time $T$ and is independent of the dimension.
Figure 4: Regret of LetC algorithm with doubling trick. The figure shows the regret of the LetC algorithm with the doubling trick, plotted against time $T$ on a log-log scale. In this approach, each time segment expands by a factor of 2, and the estimation is restarted at the beginning of each new segment. The correspondence between the dimension and the color of the curve is as follows, $d=4$ is 'Blue', $d=8$ is 'Orange', $d=16$ is 'Green', $d=32$ is 'Red', $d=64$ is 'Purple'.
Figure 5: Histogram of Revenue Improvement. We plot the histogram of revenue improvement of our LetC algorithm over offline policy for a total of 105 products. The revenue is calculated as expected demand $\times$ price, aggregated over a one-year period. The revenue improvement is calculated as $(\hbox{revenue(LetC)}-\hbox{revenue(offline)}) / \hbox{revenue(offline)} \times 100\%$. We also plot the fitted (density) curve by kernel density estimation) Compared to offline policy, our algorithm achieves more than 5% revenue improvement in most products.
...and 1 more figures

Theorems & Definitions (17)

remark
remark
Lemma 1
Theorem 1: Dimension-free regret upper bound
Lemma 2
proof
example 1
Theorem 2: A general regret upper bound
Lemma 3: Estimation error in $\widehat{\boldsymbol{\theta}}$ from Stage 2 (informal)
Theorem 3
...and 7 more

Localized exploration in contextual dynamic pricing achieves dimension-free regret

TL;DR

Abstract

Localized exploration in contextual dynamic pricing achieves dimension-free regret

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (17)