Dynamic Pricing and Learning with Long-term Reference Effects

Shipra Agrawal; Wei Tang

Dynamic Pricing and Learning with Long-term Reference Effects

Shipra Agrawal, Wei Tang

TL;DR

This paper introduces the averaging reference mechanism (ARM) to model long-term reference-price effects in dynamic pricing. It shows fixed-price policies can incur linear regret under ARM, while markdown pricing is near-optimal, with gain-seeking customers achieving optimal markdown and loss-averse customers attaining near-optimal markdown with a logarithmic gap. For linear base demand, the authors provide a detailed structural characterization of near-optimal markdown price curves and develop an efficient algorithm to compute them. When demand parameters are unknown, they design a learning algorithm that achieves a near-optimal regret of $ ilde{O}(ar{p}^{3.5}\sqrt{T})$ and provide matching lower bounds, addressing the non-stationary nature of the ARM dynamic programming problem. The work links theoretical insights to practical pricing heuristics (markdowns) and opens directions for broader demand models, adaptive exploration, and price-reference interactions in realistic marketplaces.

Abstract

We consider a dynamic pricing problem where customer response to the current price is impacted by the customer price expectation, aka reference price. We study a simple and novel reference price mechanism where reference price is the average of the past prices offered by the seller. As opposed to the more commonly studied exponential smoothing mechanism, in our reference price mechanism the prices offered by seller have a longer term effect on the future customer expectations. We show that under this mechanism, a markdown policy is near-optimal irrespective of the parameters of the model. This matches the common intuition that a seller may be better off by starting with a higher price and then decreasing it, as the customers feel like they are getting bargains on items that are ordinarily more expensive. For linear demand models, we also provide a detailed characterization of the near-optimal markdown policy along with an efficient way of computing it. We then consider a more challenging dynamic pricing and learning problem, where the demand model parameters are apriori unknown, and the seller needs to learn them online from the customers' responses to the offered prices while simultaneously optimizing revenue. The objective is to minimize regret, i.e., the $T$-round revenue loss compared to a clairvoyant optimal policy. This task essentially amounts to learning a non-stationary optimal policy in a time-variant Markov Decision Process (MDP). For linear demand models, we provide an efficient learning algorithm with an optimal $\tilde{O}(\sqrt{T})$ regret upper bound.

Dynamic Pricing and Learning with Long-term Reference Effects

TL;DR

and provide matching lower bounds, addressing the non-stationary nature of the ARM dynamic programming problem. The work links theoretical insights to practical pricing heuristics (markdowns) and opens directions for broader demand models, adaptive exploration, and price-reference interactions in realistic marketplaces.

Abstract

-round revenue loss compared to a clairvoyant optimal policy. This task essentially amounts to learning a non-stationary optimal policy in a time-variant Markov Decision Process (MDP). For linear demand models, we provide an efficient learning algorithm with an optimal

regret upper bound.

Paper Structure (24 sections, 34 theorems, 134 equations, 1 figure, 4 algorithms)

This paper contains 24 sections, 34 theorems, 134 equations, 1 figure, 4 algorithms.

Introduction
Our Contributions and Results
Related Work
Problem Formulation
Characterizing (Near-)Optimal Pricing Policy
The Sub-Optimality of Fixed Price
The (Near) Optimality of Markdown Pricing
Characterizing Near-optimal Markdown Price Curve
Learning and optimization under demand uncertainty
The Learning Challenges
Solution Ideas and the Proposed Learning Algorithm
Regret Analysis
Proof of
Conclusions and Future Directions
Missing Proofs in Section \ref{['sec:opt']}
...and 9 more sections

Key Result

Proposition 1

There exists an $\mathsf{ARM}$ problem instance with linear base demand model, i.e., $H(p) = b-a p$ and loss-neutral customers (i.e., $\eta^+ = \eta^-$), and an initial reference price $r_1$ such that for any fixed-price policy $\mathbf{p}$, we have $V^*(r_1) - V^{\mathbf{p}}(r_1) = \Omega(T)$.

Figures (1)

Figure 1: An illustration of the price curve defined in \ref{['prop:approx markdown structure']}.

Theorems & Definitions (66)

Definition 2.1: $\mathsf{ARM}$
Remark 1
Proposition 1
Remark 2
Definition 3.1: Markdown pricing policy
Theorem 1: Near optimality of markdown pricing policy
Lemma 3.1
Lemma 3.2
Lemma 3.3: Optimal revenue gap w.r.t. different starting reference price
Proposition 2
...and 56 more

Dynamic Pricing and Learning with Long-term Reference Effects

TL;DR

Abstract

Dynamic Pricing and Learning with Long-term Reference Effects

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (66)