Localized exploration in contextual dynamic pricing achieves dimension-free regret
Jinhang Chai, Yaqi Duan, Jianqing Fan, Kaizheng Wang
TL;DR
This work tackles contextual dynamic pricing with a contextual linear demand model and introduces a three-stage LetC algorithm that combines burn-in exploration, localized refinement, and committing to a learned policy. A novel critical inequality governs the exploration-exploitation trade-off, enabling dimension-free regret bounds in the large-$T$ regime and a complete non-asymptotic bound for all horizons; a minimax lower bound confirms optimality. The analysis reveals a deep link between localized exploration and ridge-regression-type regularization, and demonstrates robust performance through extensive synthetic and real-data experiments. The results offer a principled route to dimension-robust online pricing in high-dimensional contexts with practical implications for online marketplaces.
Abstract
We study the problem of contextual dynamic pricing with a linear demand model. We propose a novel localized exploration-then-commit (LetC) algorithm which starts with a pure exploration stage, followed by a refinement stage that explores near the learned optimal pricing policy, and finally enters a pure exploitation stage. The algorithm is shown to achieve a minimax optimal, dimension-free regret bound when the time horizon exceeds a polynomial of the covariate dimension. Furthermore, we provide a general theoretical framework that encompasses the entire time spectrum, demonstrating how to balance exploration and exploitation when the horizon is limited. The analysis is powered by a novel critical inequality that depicts the exploration-exploitation trade-off in dynamic pricing, mirroring its existing counterpart for the bias-variance trade-off in regularized regression. Our theoretical results are validated by extensive experiments on synthetic and real-world data.
