Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints

Zifeng Zhao; Feiyu Jiang; Yi Yu

Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints

Zifeng Zhao, Feiyu Jiang, Yi Yu

TL;DR

The paper provides a tight characterization of contextual dynamic pricing under GLM demand, establishing an optimal regret rate of $\tilde{O}(\sqrt{dT})$ up to logarithmic factors. It introduces two practical algorithms: a confidence-bound-based supCB (with a GLM revenue discretization) and an explore-then-commit (ETC) scheme, both achieving near-optimal performance; crucially, it reveals a one-dimensional pricing action space that enables effective discretization despite high contextual dimensionality. Extending to local differential privacy, the authors propose a stochastic-gradient-descent-based ETC-LDP algorithm attaining regret $\tilde{O}(d\sqrt{T}/\varepsilon)$, and they derive minimax lower bounds matching the privacy-utility tradeoffs. The work further broadens to mixed privacy and $(\varepsilon,\Delta)$-LDP settings, connecting dynamic pricing with privacy-driven constraints while confirming empirical gains through extensive simulations and real-data applications. Overall, the results bridge dynamic pricing with and without LDP, offering both tight theory and actionable algorithms for privacy-conscious online pricing under GLM demand.

Abstract

We study contextual dynamic pricing problems where a firm sells products to $T$ sequentially-arriving consumers, behaving according to an unknown demand model. The firm aims to minimize its regret over a clairvoyant that knows the model in advance. The demand follows a generalized linear model (GLM), allowing for stochastic feature vectors in $\mathbb R^d$ encoding product and consumer information. We first show the optimal regret is of order $\sqrt{dT}$, up to logarithmic factors, improving existing upper bounds by a $\sqrt{d}$ factor. This optimal rate is materialized by two algorithms: a confidence bound-type algorithm and an explore-then-commit (ETC) algorithm. A key insight is an intrinsic connection between dynamic pricing and contextual multi-armed bandit problems with many arms with a careful discretization. We further study contextual dynamic pricing under local differential privacy (LDP) constraints. We propose a stochastic gradient descent-based ETC algorithm achieving regret upper bounds of order $d\sqrt{T}/ε$, up to logarithmic factors, where $ε>0$ is the privacy parameter. The upper bounds with and without LDP constraints are matched by newly constructed minimax lower bounds, characterizing costs of privacy. Moreover, we extend our study to dynamic pricing under mixed privacy constraints, improving the privacy-utility tradeoff by leveraging public data. This is the first time such setting is studied in the dynamic pricing literature and our theoretical results seamlessly bridge dynamic pricing with and without LDP. Extensive numerical experiments and real data applications are conducted to illustrate the efficiency and practical value of our algorithms.

Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints

TL;DR

The paper provides a tight characterization of contextual dynamic pricing under GLM demand, establishing an optimal regret rate of

up to logarithmic factors. It introduces two practical algorithms: a confidence-bound-based supCB (with a GLM revenue discretization) and an explore-then-commit (ETC) scheme, both achieving near-optimal performance; crucially, it reveals a one-dimensional pricing action space that enables effective discretization despite high contextual dimensionality. Extending to local differential privacy, the authors propose a stochastic-gradient-descent-based ETC-LDP algorithm attaining regret

, and they derive minimax lower bounds matching the privacy-utility tradeoffs. The work further broadens to mixed privacy and

-LDP settings, connecting dynamic pricing with privacy-driven constraints while confirming empirical gains through extensive simulations and real-data applications. Overall, the results bridge dynamic pricing with and without LDP, offering both tight theory and actionable algorithms for privacy-conscious online pricing under GLM demand.

Abstract

We study contextual dynamic pricing problems where a firm sells products to

sequentially-arriving consumers, behaving according to an unknown demand model. The firm aims to minimize its regret over a clairvoyant that knows the model in advance. The demand follows a generalized linear model (GLM), allowing for stochastic feature vectors in

encoding product and consumer information. We first show the optimal regret is of order

, up to logarithmic factors, improving existing upper bounds by a

factor. This optimal rate is materialized by two algorithms: a confidence bound-type algorithm and an explore-then-commit (ETC) algorithm. A key insight is an intrinsic connection between dynamic pricing and contextual multi-armed bandit problems with many arms with a careful discretization. We further study contextual dynamic pricing under local differential privacy (LDP) constraints. We propose a stochastic gradient descent-based ETC algorithm achieving regret upper bounds of order

, up to logarithmic factors, where

is the privacy parameter. The upper bounds with and without LDP constraints are matched by newly constructed minimax lower bounds, characterizing costs of privacy. Moreover, we extend our study to dynamic pricing under mixed privacy constraints, improving the privacy-utility tradeoff by leveraging public data. This is the first time such setting is studied in the dynamic pricing literature and our theoretical results seamlessly bridge dynamic pricing with and without LDP. Extensive numerical experiments and real data applications are conducted to illustrate the efficiency and practical value of our algorithms.

Paper Structure (33 sections, 30 theorems, 266 equations, 13 figures, 3 tables, 6 algorithms)

This paper contains 33 sections, 30 theorems, 266 equations, 13 figures, 3 tables, 6 algorithms.

Literature review
Dynamic pricing with demand learning
Contextual multi-armed bandit
Differential privacy for online learning
Additional numerical results
Additional results for ETC and supCB with known $T$
Additional results for experiments on synthetic data
Additional results for real data analysis
The supCB algorithm
A modified supCB for adversarial contexts
Technical details
Dynamic pricing under adversarial contexts
Numerical experiments
Proof of \ref{['thm:UCB_adv']}
Technical details in \ref{['sec-without-LDP']} and \ref{['sec-supCB']}
...and 18 more sections

Key Result

Theorem S.3.1

Suppose assum_feature holds. For any $\delta \in (0,1),$ set $K=\sqrt{T/d}/\log(T)$, $\tau=\sqrt{dT}$ and $\alpha = {3\sigma u M_{\psi2}}/{\kappa} \cdot \sqrt{\log(3TKS/\delta)}$. Recall $S=\lfloor \log_2(T) \rfloor$. Provided that we have that, with probability at least $1-\delta-2\log (T)/T$, the regret of the supCB algorithm in algorithm:SupCB is upper bounded by $R_T \leq B_{S3}\cdot \sqrt{dT

Figures (13)

Figure S.1: supCB under (S1). [Left]: Mean regret (with C.I.) under different $(d,T)$. [Middle]: Mean regret (in log scale) with fitted regression lines. [Right]: Boxplot of regrets at different $T$ ($d=9$).
Figure S.2: Performance of ETC under (S2) [Left]: Mean regret (with C.I.) under different $(d,T)$. [Middle]: Mean regret (in log scale) with fitted linear regression lines ($(\beta_d,\beta_T)=(0.51,0.45)$). [Right]: Boxplot of regrets (based on 500 experiments) at different $T$ (with $d=9$).
Figure S.3: Performance of supCB under (S2) [Left]: Mean regret (with C.I.) under different $(d,T)$. [Middle]: Mean regret (in log scale) with fitted linear regression lines ($(\beta_d,\beta_T)=(0.47,0.51)$). [Right]: Boxplot of regrets (based on 500 experiments) at different $T$ (with $d=9$).
Figure S.4: Average computation time (with C.I.) of ETC and supCB under (S1).
Figure S.5: (S2) Mean regret (with C.I.) of ETC-Doubling and modified MLE-Cycle and modified Semi-Myopic with unknown $T$.
...and 8 more figures

Theorems & Definitions (66)

Theorem S.3.1
Lemma S.3.1
Theorem S.3.2
proof : Proof of \ref{['lem:theta_err_ridge']}
Theorem S.4.1
proof : Proof of \ref{['thm:UCB_adv']}
Lemma S.4.1
proof : Proof
Proposition S.5.1: Matrix Bernstein Inequality
proof : Proof of \ref{['prop:matrix_bern']}
...and 56 more

Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints

TL;DR

Abstract

Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (66)