Poisson-MNL Bandit: Nearly Optimal Dynamic Joint Assortment and Pricing with Decision-Dependent Customer Arrivals

Junhui Cai; Ran Chen; Qitao Huang; Linda Zhao; Wu Zhu

Poisson-MNL Bandit: Nearly Optimal Dynamic Joint Assortment and Pricing with Decision-Dependent Customer Arrivals

Junhui Cai, Ran Chen, Qitao Huang, Linda Zhao, Wu Zhu

Abstract

We study dynamic joint assortment and pricing where a seller updates decisions at regular accounting/operating intervals to maximize the cumulative per-period revenue over a horizon $T$. In many settings, assortment and prices affect not only what an arriving customer buys but also how many customers arrive within the period, whereas classical multinomial logit (MNL) models assume arrivals as fixed, potentially leading to suboptimal decisions. We propose a Poisson-MNL model that couples a contextual MNL choice model with a Poisson arrival model whose rate depends on the offered assortment and prices. Building on this model, we develop an efficient algorithm PMNL based on the idea of upper confidence bound (UCB). We establish its (near) optimality by proving a non-asymptotic regret bound of order $\sqrt{T\log{T}}$ and a matching lower bound (up to $\log T$). Simulation studies underscore the importance of accounting for the dependency of arrival rates on assortment and pricing: PMNL effectively learns customer choice and arrival models and provides joint assortment-pricing decisions that outperform others that assume fixed arrival rates.

Poisson-MNL Bandit: Nearly Optimal Dynamic Joint Assortment and Pricing with Decision-Dependent Customer Arrivals

Abstract

We study dynamic joint assortment and pricing where a seller updates decisions at regular accounting/operating intervals to maximize the cumulative per-period revenue over a horizon

. In many settings, assortment and prices affect not only what an arriving customer buys but also how many customers arrive within the period, whereas classical multinomial logit (MNL) models assume arrivals as fixed, potentially leading to suboptimal decisions. We propose a Poisson-MNL model that couples a contextual MNL choice model with a Poisson arrival model whose rate depends on the offered assortment and prices. Building on this model, we develop an efficient algorithm PMNL based on the idea of upper confidence bound (UCB). We establish its (near) optimality by proving a non-asymptotic regret bound of order

and a matching lower bound (up to

). Simulation studies underscore the importance of accounting for the dependency of arrival rates on assortment and pricing: PMNL effectively learns customer choice and arrival models and provides joint assortment-pricing decisions that outperform others that assume fixed arrival rates.

Paper Structure (80 sections, 34 theorems, 317 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 80 sections, 34 theorems, 317 equations, 5 figures, 1 table, 1 algorithm.

Introduction
Related Literature
Dynamic Assortment and Pricing.
Bandits.
Customer Arrivals.
Notation
Outline
Problem Formulation
Choice Model with Poisson Arrival
Poisson arrival model.
MNL choice model.
Retailer's Objective and Regret
Assumptions
Algorithm
Likelihood Function
...and 65 more sections

Key Result

Lemma 1

The following inequalities hold:

Figures (5)

Figure 1: Comparison of cumulative regret of PMNL and $\texttt{FM23}$ferreira2023demand across three settings. In each panel, the solid line shows the sample average cumulative regret over 100 simulations for PMNL, and the dashed line corresponds to $\texttt{FM23}$. Shaded regions indicate the 10th and 90th percentiles.
Figure 2: Pricing decisions for the products. The solid line shows the median price under PMNL across 100 simulations, and the dashed line corresponds to $\texttt{FM23}$. Shaded regions indicate the 10th and 90th percentiles.
Figure 3: Comparison of cumulative regret of PMNL and a naive UCB algorithm. The solid line shows the sample average of the cumulative regret across 100 simulations for PMNL, while the dashed line corresponds to UCB. Shaded regions indicate the 10th and 90th percentiles.
Figure EC.1: Proof Structure of \ref{['th:regret_bound']}.
Figure EC.2: Estimation Error of Unknown Parameters by $\texttt{PMNL}$ and $\texttt{FM23}$. The left panel shows the estimation error for the customer preference for both algorithms, while the right panel shows the estimation error for the arrival parameters, which only appear in $\texttt{PMNL}$.

Theorems & Definitions (69)

Remark 1
Remark 2
Definition 1
Definition 2
Definition 3
Lemma 1
Theorem 1
Remark 3
Remark 4
Definition 4: Sub-Gaussian
...and 59 more

Poisson-MNL Bandit: Nearly Optimal Dynamic Joint Assortment and Pricing with Decision-Dependent Customer Arrivals

Abstract

Poisson-MNL Bandit: Nearly Optimal Dynamic Joint Assortment and Pricing with Decision-Dependent Customer Arrivals

Authors

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (69)