Table of Contents
Fetching ...

Online Joint Assortment-Inventory Optimization under MNL Choices

Yong Liang, Xiaojie Mao, Shiyuan Wang

TL;DR

This work tackles online joint assortment-inventory optimization under Multinomial Logit (MNL) choices with unknown attraction parameters $\boldsymbol{v}$. It introduces a novel Counting Estimator (CE) for $v_i$ using no-purchase inter-purchase counts, and a novel exploration mechanism that adaptively tunes unit profits $\boldsymbol{r}$ to incentivize exploration while exploiting current estimates. The core algorithm combines CE-based parameter learning with a UCB-style decision rule that is adjusted by $\hat{\boldsymbol{r}}_t$ to produce optimistic profit estimates; an exact static optimization oracle is assumed initially, with later extensions to approximate oracles and carryover/dynamic arrivals. The authors establish nonasymptotic regret upper and lower bounds, showing near-optimal performance under exact oracles, and provide robust results when using practical approximate solvers. They demonstrate effectiveness through small- and large-scale numerical experiments, real-world data calibrations, and extensions to inventory carryover and unknown arrival distributions, highlighting practical impact for data-driven retail optimization.

Abstract

We study an online joint assortment-inventory optimization problem, in which we assume that the choice behavior of each customer follows the Multinomial Logit (MNL) choice model, and the attraction parameters are unknown a priori. The retailer makes periodic assortment and inventory decisions to dynamically learn from the customer choice observations about the attraction parameters while maximizing the expected total profit over time. In this paper, we propose a novel algorithm that can effectively balance exploration and exploitation in the online decision-making of assortment and inventory. Our algorithm builds on a new estimator for the MNL attraction parameters, an innovative approach to incentivize exploration by adaptively tuning certain known and unknown parameters, and an optimization oracle to static single-cycle assortment-inventory planning problems with given parameters. We establish a regret upper bound for our algorithm and a lower bound for the online joint assortment-inventory optimization problem, suggesting that our algorithm achieves nearly optimal regret rate, provided that the static optimization oracle is exact. Then we incorporate more practical approximate static optimization oracles into our algorithm, and bound from above the impact of static optimization errors on the regret of our algorithm. We perform numerical studies to demonstrate the effectiveness of our proposed algorithm. At last, we extend our study by incorporating inventory carryover and the learning of customer arrival distribution.

Online Joint Assortment-Inventory Optimization under MNL Choices

TL;DR

This work tackles online joint assortment-inventory optimization under Multinomial Logit (MNL) choices with unknown attraction parameters . It introduces a novel Counting Estimator (CE) for using no-purchase inter-purchase counts, and a novel exploration mechanism that adaptively tunes unit profits to incentivize exploration while exploiting current estimates. The core algorithm combines CE-based parameter learning with a UCB-style decision rule that is adjusted by to produce optimistic profit estimates; an exact static optimization oracle is assumed initially, with later extensions to approximate oracles and carryover/dynamic arrivals. The authors establish nonasymptotic regret upper and lower bounds, showing near-optimal performance under exact oracles, and provide robust results when using practical approximate solvers. They demonstrate effectiveness through small- and large-scale numerical experiments, real-world data calibrations, and extensions to inventory carryover and unknown arrival distributions, highlighting practical impact for data-driven retail optimization.

Abstract

We study an online joint assortment-inventory optimization problem, in which we assume that the choice behavior of each customer follows the Multinomial Logit (MNL) choice model, and the attraction parameters are unknown a priori. The retailer makes periodic assortment and inventory decisions to dynamically learn from the customer choice observations about the attraction parameters while maximizing the expected total profit over time. In this paper, we propose a novel algorithm that can effectively balance exploration and exploitation in the online decision-making of assortment and inventory. Our algorithm builds on a new estimator for the MNL attraction parameters, an innovative approach to incentivize exploration by adaptively tuning certain known and unknown parameters, and an optimization oracle to static single-cycle assortment-inventory planning problems with given parameters. We establish a regret upper bound for our algorithm and a lower bound for the online joint assortment-inventory optimization problem, suggesting that our algorithm achieves nearly optimal regret rate, provided that the static optimization oracle is exact. Then we incorporate more practical approximate static optimization oracles into our algorithm, and bound from above the impact of static optimization errors on the regret of our algorithm. We perform numerical studies to demonstrate the effectiveness of our proposed algorithm. At last, we extend our study by incorporating inventory carryover and the learning of customer arrival distribution.
Paper Structure (40 sections, 24 theorems, 81 equations, 9 figures, 3 tables, 5 algorithms)

This paper contains 40 sections, 24 theorems, 81 equations, 9 figures, 3 tables, 5 algorithms.

Key Result

Lemma 1

For any product $i\in [N]$, $\mu_{i, k}$'s are independent and identically distributed geometric random variables with parameter $v_{i} / (1 + v_{i})$, whose distribution and expectation are given as follows:

Figures (9)

  • Figure 1: Sequence of events in cycle $t$: At the beginning of inventory cycle $t$, the retailer makes inventory ordering decision $\boldsymbol{u}_t$. Then, $M_t$ customers arrive sequentially, and each customer either purchases one product from the available assortment or leaves for the outside option. At the end of cycle $t$, inventory leftovers are salvaged.
  • Figure 2: An example of the estimation procedure.
  • Figure 3: Average regrets of our algorithm and two benchmark algorithms over different time horizons $T$ under the two settings listed in \ref{['table-numerical-settings']}. Results are based on $10$ independent replications of the experiment.
  • Figure 4: Boxplots of the cumulative profits at four time points, $T \in \{5000, 10000, 15000, 20000\}$. Results are based on $1000$ independent replications of each experiment.
  • Figure 5: Boxplots of the cumulative profits at four time points, $t \in \{5000, 10000, 15000, 20000\}$. Results are based on $1,000$ independent replications of each experiment.
  • ...and 4 more figures

Theorems & Definitions (28)

  • Example 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Example 2
  • Lemma 4
  • Lemma 5
  • Theorem 1
  • Definition 1
  • Lemma 6
  • ...and 18 more