Improved Algorithms for Contextual Dynamic Pricing

Matilde Tullii; Solenne Gaucher; Nadav Merlis; Vianney Perchet

Improved Algorithms for Contextual Dynamic Pricing

Matilde Tullii, Solenne Gaucher, Nadav Merlis, Vianney Perchet

TL;DR

The paper tackles contextual dynamic pricing where a seller must maximize revenue by posting prices based on covariates while receiving only binary feedback. It introduces VAPE, a valuation-approximation and price-elimination framework that decouples learning the context-dependent valuation $g(x)$ from estimating the demand and shares information across contexts. In the linear valuation setting, VAPE achieves a minimax-optimal $ ilde{O}(T^{2/3})$ regret, and in the non-parametric Hölder setting it attains a rate of $ ilde{O}(T^{(d+2\beta)/(d+3\beta)})$ under mild Lipschitz noise assumptions; both results improve over prior bounds and rely on adaptive, cross-context learning. The work offers a principled scheme for contextual pricing with minimal regularity requirements, with implications for revenue management and related online learning problems under contextual feedback.

Abstract

In contextual dynamic pricing, a seller sequentially prices goods based on contextual information. Buyers will purchase products only if the prices are below their valuations. The goal of the seller is to design a pricing strategy that collects as much revenue as possible. We focus on two different valuation models. The first assumes that valuations linearly depend on the context and are further distorted by noise. Under minor regularity assumptions, our algorithm achieves an optimal regret bound of $\tilde{\mathcal{O}}(T^{2/3})$, improving the existing results. The second model removes the linearity assumption, requiring only that the expected buyer valuation is $β$-Hölder in the context. For this model, our algorithm obtains a regret $\tilde{\mathcal{O}}(T^{d+2β/d+3β})$, where $d$ is the dimension of the context space.

Improved Algorithms for Contextual Dynamic Pricing

TL;DR

from estimating the demand and shares information across contexts. In the linear valuation setting, VAPE achieves a minimax-optimal

regret, and in the non-parametric Hölder setting it attains a rate of

under mild Lipschitz noise assumptions; both results improve over prior bounds and rely on adaptive, cross-context learning. The work offers a principled scheme for contextual pricing with minimal regularity requirements, with implications for revenue management and related online learning problems under contextual feedback.

Abstract

, improving the existing results. The second model removes the linearity assumption, requiring only that the expected buyer valuation is

-Hölder in the context. For this model, our algorithm obtains a regret

, where

is the dimension of the context space.

Paper Structure (31 sections, 14 theorems, 86 equations, 2 figures, 1 table, 3 algorithms)

This paper contains 31 sections, 14 theorems, 86 equations, 2 figures, 1 table, 3 algorithms.

Introduction
Related Work
Outline and Contributions
Preliminaries
Model and Notations
Assumptions
Information Sharing in Contextual Dynamic Pricing
Algorithmic Approach
Outline of the Algorithm
A First Bound on the Regret
Linear Valuation Functions
Non-Parametric Valuation Functions
Conclusions
Simulations
vape
...and 16 more sections

Key Result

Theorem 1

Assume that the valuations follow the model given by Equations eq:contextual_model and eq:linear_model. Under Assumptions hyp:context, hyp:noise, and hyp:theta, the regret of Algorithm VAPE for Linear Valuations with parameters $\epsilon = (d^2\log(T)^2/T)^{1/3}$, $\mu = \epsilon/\left(B_y\sqrt{d\l with probability $1-\tilde{\mathcal{O}}(T^{-1})$, where $C_{B_{\xi}, B_{x}, B_{\theta}, L_{\xi}}$ i

Figures (2)

Figure 1: The plots here show the regrets rate of vape for linear evaluations, both in the standard and logarithmic scale (left and right respectively). The solid lines represent the average of the performance over $15$ repetitions of the routine. The faded red area shows the standard error, while in the right subplot the dotted line corresponds to the theoretical regret bound.
Figure 2: The two subplots show a comparison between VAPE and the algorithm in fan2024policy in the stochastic and adversarial case, where the time horizons used are $T\in[1000, 1700, 3000, 5000]$ (left subplot), and $T\in[1000, 1400, 4200, 9000]$ (right subplot). In both cases the solid lines represent the average of the regret rates across the $15$ repetitions of the simulations, while the faded area the standard error. In the subplot on the right, due to the specificity of the setting, the variance across runs is minimal, hence the faded area results invisible. The regret graph is in both cases plotted in logaritmic scale.

Theorems & Definitions (15)

Claim 1
Theorem 1
Theorem 2
Lemma 1
Lemma 2
Lemma 3
Lemma 4
Lemma 5
Lemma 6
Lemma 7
...and 5 more

Improved Algorithms for Contextual Dynamic Pricing

TL;DR

Abstract

Improved Algorithms for Contextual Dynamic Pricing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (15)