A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints

Francesco Emanuele Stradi; Filippo Cipriani; Lorenzo Ciampiconi; Marco Leonardi; Alessandro Rozza; Nicola Gatti

A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints

Francesco Emanuele Stradi, Filippo Cipriani, Lorenzo Ciampiconi, Marco Leonardi, Alessandro Rozza, Nicola Gatti

TL;DR

This paper tackles dynamic pricing of sequentially displayed complementary items under a sale constraint by formulating the problem as a constrained Markov decision process (CMDP) and introducing a primal-dual online learning algorithm. The PD-DP method uses occupancy-measure representations, optimistic estimations, and epoch-based transition confidence sets to learn in environments with non-stationary demand and adversarial or stochastic rewards/constraints, without assuming known reward or constraint functions. Empirical evaluation on synthetic data generated from real-world LastMinute datasets demonstrates sublinear regret and effective constraint satisfaction, with a convex-combination policy addressing partial observability in intermediate states. The work advances practical dynamic pricing by enabling coordinated pricing of interdependent items under realistic online learning constraints, and suggests avenues for real-world deployment and function-approximation enhancements.

Abstract

We address the challenging problem of dynamically pricing complementary items that are sequentially displayed to customers. An illustrative example is the online sale of flight tickets, where customers navigate through multiple web pages. Initially, they view the ticket cost, followed by ancillary expenses such as insurance and additional luggage fees. Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective. Our scenario also involves a sales constraint, which specifies a minimum number of items to sell, and uncertainty regarding customer demand curves. To tackle this problem, we originally formulate it as a Markov Decision Process with constraints. Leveraging online learning tools, we design a primal-dual online optimization algorithm. We empirically evaluate our approach using synthetic settings randomly generated from real-world data, covering various configurations from stationary to non-stationary, and compare its performance in terms of constraints violation and regret against well-known baselines optimizing each state singularly.

A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints

TL;DR

Abstract

Paper Structure (50 sections, 1 theorem, 16 equations, 8 figures, 1 table, 2 algorithms)

This paper contains 50 sections, 1 theorem, 16 equations, 8 figures, 1 table, 2 algorithms.

Introduction
Original contribution
Paper structure
Related Work
Reinforcement Learning for Dynamic Pricing
Online Learning in Markov Decision Processes
Problem Formulation
Mathematical formulation of the model
Mathematical tools
Algorithm
Initialization and loss composition
Estimation of the unknown parameters
Primal loss estimation
Epochs and transition functions
Per-episode update
...and 35 more sections

Key Result

Lemma 1

For every $\boldsymbol{q} \in [0, 1]^{|X\times A\times X|}$, it holds that $q$ is a valid occupancy measure of an episodic loop-free MDP if and only if, the following three conditions hold: where $P$ is the transition function of the MDP and $P^q$ is the one induced by $q$ (see Equation eq:induced_trans).

Figures (8)

Figure 1: The (constrained) Markov decision process employed to model the dynamic pricing of complementary items problem. The states colored in blue are the ones where the agent (website) chooses the prices. The orange ones are the exit states, while the grey one is the payment page state.
Figure 2: Cumulative Regret (CR), Cumulative Constraints Violation (CCV), and the sum of CR and CCV related to the experiment.
Figure 3: Cumulative Regret (CR), Cumulative Constraints Violation (CCV), and the sum of CR and CCV related to the experiment.
Figure 4: Cumulative Regret (CR), Cumulative Constraints Violation (CCV), and the sum of CR and CCV related to the experiment.
Figure 5: Cumulative Regret (CR), Cumulative Constraints Violation (CCV), and the sum of CR and CCV related to the experiment.
...and 3 more figures

Theorems & Definitions (2)

Remark 1
Lemma 1: rosenberg19a

A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints

TL;DR

Abstract

A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (2)