Table of Contents
Fetching ...

Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints

Zifeng Zhao, Feiyu Jiang, Yi Yu

TL;DR

The paper provides a tight characterization of contextual dynamic pricing under GLM demand, establishing an optimal regret rate of $\tilde{O}(\sqrt{dT})$ up to logarithmic factors. It introduces two practical algorithms: a confidence-bound-based supCB (with a GLM revenue discretization) and an explore-then-commit (ETC) scheme, both achieving near-optimal performance; crucially, it reveals a one-dimensional pricing action space that enables effective discretization despite high contextual dimensionality. Extending to local differential privacy, the authors propose a stochastic-gradient-descent-based ETC-LDP algorithm attaining regret $\tilde{O}(d\sqrt{T}/\varepsilon)$, and they derive minimax lower bounds matching the privacy-utility tradeoffs. The work further broadens to mixed privacy and $(\varepsilon,\Delta)$-LDP settings, connecting dynamic pricing with privacy-driven constraints while confirming empirical gains through extensive simulations and real-data applications. Overall, the results bridge dynamic pricing with and without LDP, offering both tight theory and actionable algorithms for privacy-conscious online pricing under GLM demand.

Abstract

We study contextual dynamic pricing problems where a firm sells products to $T$ sequentially-arriving consumers, behaving according to an unknown demand model. The firm aims to minimize its regret over a clairvoyant that knows the model in advance. The demand follows a generalized linear model (GLM), allowing for stochastic feature vectors in $\mathbb R^d$ encoding product and consumer information. We first show the optimal regret is of order $\sqrt{dT}$, up to logarithmic factors, improving existing upper bounds by a $\sqrt{d}$ factor. This optimal rate is materialized by two algorithms: a confidence bound-type algorithm and an explore-then-commit (ETC) algorithm. A key insight is an intrinsic connection between dynamic pricing and contextual multi-armed bandit problems with many arms with a careful discretization. We further study contextual dynamic pricing under local differential privacy (LDP) constraints. We propose a stochastic gradient descent-based ETC algorithm achieving regret upper bounds of order $d\sqrt{T}/ε$, up to logarithmic factors, where $ε>0$ is the privacy parameter. The upper bounds with and without LDP constraints are matched by newly constructed minimax lower bounds, characterizing costs of privacy. Moreover, we extend our study to dynamic pricing under mixed privacy constraints, improving the privacy-utility tradeoff by leveraging public data. This is the first time such setting is studied in the dynamic pricing literature and our theoretical results seamlessly bridge dynamic pricing with and without LDP. Extensive numerical experiments and real data applications are conducted to illustrate the efficiency and practical value of our algorithms.

Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints

TL;DR

The paper provides a tight characterization of contextual dynamic pricing under GLM demand, establishing an optimal regret rate of up to logarithmic factors. It introduces two practical algorithms: a confidence-bound-based supCB (with a GLM revenue discretization) and an explore-then-commit (ETC) scheme, both achieving near-optimal performance; crucially, it reveals a one-dimensional pricing action space that enables effective discretization despite high contextual dimensionality. Extending to local differential privacy, the authors propose a stochastic-gradient-descent-based ETC-LDP algorithm attaining regret , and they derive minimax lower bounds matching the privacy-utility tradeoffs. The work further broadens to mixed privacy and -LDP settings, connecting dynamic pricing with privacy-driven constraints while confirming empirical gains through extensive simulations and real-data applications. Overall, the results bridge dynamic pricing with and without LDP, offering both tight theory and actionable algorithms for privacy-conscious online pricing under GLM demand.

Abstract

We study contextual dynamic pricing problems where a firm sells products to sequentially-arriving consumers, behaving according to an unknown demand model. The firm aims to minimize its regret over a clairvoyant that knows the model in advance. The demand follows a generalized linear model (GLM), allowing for stochastic feature vectors in encoding product and consumer information. We first show the optimal regret is of order , up to logarithmic factors, improving existing upper bounds by a factor. This optimal rate is materialized by two algorithms: a confidence bound-type algorithm and an explore-then-commit (ETC) algorithm. A key insight is an intrinsic connection between dynamic pricing and contextual multi-armed bandit problems with many arms with a careful discretization. We further study contextual dynamic pricing under local differential privacy (LDP) constraints. We propose a stochastic gradient descent-based ETC algorithm achieving regret upper bounds of order , up to logarithmic factors, where is the privacy parameter. The upper bounds with and without LDP constraints are matched by newly constructed minimax lower bounds, characterizing costs of privacy. Moreover, we extend our study to dynamic pricing under mixed privacy constraints, improving the privacy-utility tradeoff by leveraging public data. This is the first time such setting is studied in the dynamic pricing literature and our theoretical results seamlessly bridge dynamic pricing with and without LDP. Extensive numerical experiments and real data applications are conducted to illustrate the efficiency and practical value of our algorithms.
Paper Structure (33 sections, 30 theorems, 266 equations, 13 figures, 3 tables, 6 algorithms)

This paper contains 33 sections, 30 theorems, 266 equations, 13 figures, 3 tables, 6 algorithms.

Key Result

Theorem S.3.1

Suppose assum_feature holds. For any $\delta \in (0,1),$ set $K=\sqrt{T/d}/\log(T)$, $\tau=\sqrt{dT}$ and $\alpha = {3\sigma u M_{\psi2}}/{\kappa} \cdot \sqrt{\log(3TKS/\delta)}$. Recall $S=\lfloor \log_2(T) \rfloor$. Provided that we have that, with probability at least $1-\delta-2\log (T)/T$, the regret of the supCB algorithm in algorithm:SupCB is upper bounded by $R_T \leq B_{S3}\cdot \sqrt{dT

Figures (13)

  • Figure S.1: supCB under (S1). [Left]: Mean regret (with C.I.) under different $(d,T)$. [Middle]: Mean regret (in log scale) with fitted regression lines. [Right]: Boxplot of regrets at different $T$ ($d=9$).
  • Figure S.2: Performance of ETC under (S2) [Left]: Mean regret (with C.I.) under different $(d,T)$. [Middle]: Mean regret (in log scale) with fitted linear regression lines ($(\beta_d,\beta_T)=(0.51,0.45)$). [Right]: Boxplot of regrets (based on 500 experiments) at different $T$ (with $d=9$).
  • Figure S.3: Performance of supCB under (S2) [Left]: Mean regret (with C.I.) under different $(d,T)$. [Middle]: Mean regret (in log scale) with fitted linear regression lines ($(\beta_d,\beta_T)=(0.47,0.51)$). [Right]: Boxplot of regrets (based on 500 experiments) at different $T$ (with $d=9$).
  • Figure S.4: Average computation time (with C.I.) of ETC and supCB under (S1).
  • Figure S.5: (S2) Mean regret (with C.I.) of ETC-Doubling and modified MLE-Cycle and modified Semi-Myopic with unknown $T$.
  • ...and 8 more figures

Theorems & Definitions (66)

  • Theorem S.3.1
  • Lemma S.3.1
  • Theorem S.3.2
  • proof : Proof of \ref{['lem:theta_err_ridge']}
  • Theorem S.4.1
  • proof : Proof of \ref{['thm:UCB_adv']}
  • Lemma S.4.1
  • proof : Proof
  • Proposition S.5.1: Matrix Bernstein Inequality
  • proof : Proof of \ref{['prop:matrix_bern']}
  • ...and 56 more