Table of Contents
Fetching ...

Online Assortment and Price Optimization Under Contextual Choice Models

Yigit Efe Erginbas, Thomas A. Courtade, Kannan Ramchandran

TL;DR

An algorithm is proposed that learns from user feedback and achieves a revenue regret of order $\widetilde{O}(d \sqrt{K T} / L_0 )$ where $L_0$ is the minimum price sensitivity parameter.

Abstract

We consider an assortment selection and pricing problem in which a seller has $N$ different items available for sale. In each round, the seller observes a $d$-dimensional contextual preference information vector for the user, and offers to the user an assortment of $K$ items at prices chosen by the seller. The user selects at most one of the products from the offered assortment according to a multinomial logit choice model whose parameters are unknown. The seller observes which, if any, item is chosen at the end of each round, with the goal of maximizing cumulative revenue over a selling horizon of length $T$. For this problem, we propose an algorithm that learns from user feedback and achieves a revenue regret of order $\widetilde{O}(d \sqrt{K T} / L_0 )$ where $L_0$ is the minimum price sensitivity parameter. We also obtain a lower bound of order $Ω(d \sqrt{T}/ L_0)$ for the regret achievable by any algorithm.

Online Assortment and Price Optimization Under Contextual Choice Models

TL;DR

An algorithm is proposed that learns from user feedback and achieves a revenue regret of order where is the minimum price sensitivity parameter.

Abstract

We consider an assortment selection and pricing problem in which a seller has different items available for sale. In each round, the seller observes a -dimensional contextual preference information vector for the user, and offers to the user an assortment of items at prices chosen by the seller. The user selects at most one of the products from the offered assortment according to a multinomial logit choice model whose parameters are unknown. The seller observes which, if any, item is chosen at the end of each round, with the goal of maximizing cumulative revenue over a selling horizon of length . For this problem, we propose an algorithm that learns from user feedback and achieves a revenue regret of order where is the minimum price sensitivity parameter. We also obtain a lower bound of order for the regret achievable by any algorithm.

Paper Structure

This paper contains 31 sections, 45 theorems, 261 equations, 3 figures, 2 algorithms.

Key Result

Proposition 3.1

Suppose utility functions $h_{ti}(p)$ are differentiable and strictly decreasing for all items $i \in [N]$. Let $B_t$ be the unique solution of the fixed point equation where $v_{ti}(B) := \max_{p \in \mathbb{R}} \left\{ f_{ti}(p) : p + 1 / h_{ti}'(p) = B \right\}$ and $f_{ti}(p) := - e^{h_{ti}(p)} / h_{ti}'(p)$. Then, the optimum assortment $S^*_t$ is the assortment $S$ that achieves the maximum

Figures (3)

  • Figure 2: The confidence region depicted in the top right corner contains the true parameter $\boldsymbol{\mathbf{\theta}}^*$ with high probability. Each parameter in the confidence set corresponds to a different linear function and we construct $h_{ti}(p)$ as a tight upper bound to $u_{ti}(p)$.
  • Figure 3: Cumulative regret for CAP (Algorithm \ref{['alg:seq_assortment']}), CAP-ONS (Algorithm \ref{['alg:seq_assortment_online']}), M3P Javanmard_Nazerzadeh_Shao_2020, ONS-MPP Perivier_Goyal_2022, a version of DBL-MNL Oh_Iyengar_2021 extended with our dynamic pricing, and a version of TS-MNL oh2019thompson extended with our dynamic pricing. The center lines show the mean across the runs while the shaded regions indicate two standard deviations. Results demonstrate the efficacy of our algorithms in achieving diminishing regret per round as our theoretical results predict. Since M3P and ONS-MPP consider only dynamic pricing, they are not able to achieve diminishing regret. DBL-MNL and TS-MNL are designed solely for assortment selection, but their extensions using our pricing approach enable simultaneous assortment selection and pricing. However, even with dynamic pricing, their regret rates quickly deteriorate as $K$ increases or $L_0$ decreases.
  • Figure 4: Log-log plot illustrating the dependency of regret for our proposed algorithm CAP. The slope of the curve reflects the empirical growth rate of regret with respect to time horizon $T$.

Theorems & Definitions (80)

  • Proposition 3.1: Optimum assortment and prices
  • proof
  • Remark 3.2
  • Theorem 4.2
  • proof
  • Remark 4.3
  • Theorem 4.4
  • Theorem 4.5
  • proof
  • Proposition A.1
  • ...and 70 more