Online Joint Assortment-Inventory Optimization under MNL Choices
Yong Liang, Xiaojie Mao, Shiyuan Wang
TL;DR
This work tackles online joint assortment-inventory optimization under Multinomial Logit (MNL) choices with unknown attraction parameters $\boldsymbol{v}$. It introduces a novel Counting Estimator (CE) for $v_i$ using no-purchase inter-purchase counts, and a novel exploration mechanism that adaptively tunes unit profits $\boldsymbol{r}$ to incentivize exploration while exploiting current estimates. The core algorithm combines CE-based parameter learning with a UCB-style decision rule that is adjusted by $\hat{\boldsymbol{r}}_t$ to produce optimistic profit estimates; an exact static optimization oracle is assumed initially, with later extensions to approximate oracles and carryover/dynamic arrivals. The authors establish nonasymptotic regret upper and lower bounds, showing near-optimal performance under exact oracles, and provide robust results when using practical approximate solvers. They demonstrate effectiveness through small- and large-scale numerical experiments, real-world data calibrations, and extensions to inventory carryover and unknown arrival distributions, highlighting practical impact for data-driven retail optimization.
Abstract
We study an online joint assortment-inventory optimization problem, in which we assume that the choice behavior of each customer follows the Multinomial Logit (MNL) choice model, and the attraction parameters are unknown a priori. The retailer makes periodic assortment and inventory decisions to dynamically learn from the customer choice observations about the attraction parameters while maximizing the expected total profit over time. In this paper, we propose a novel algorithm that can effectively balance exploration and exploitation in the online decision-making of assortment and inventory. Our algorithm builds on a new estimator for the MNL attraction parameters, an innovative approach to incentivize exploration by adaptively tuning certain known and unknown parameters, and an optimization oracle to static single-cycle assortment-inventory planning problems with given parameters. We establish a regret upper bound for our algorithm and a lower bound for the online joint assortment-inventory optimization problem, suggesting that our algorithm achieves nearly optimal regret rate, provided that the static optimization oracle is exact. Then we incorporate more practical approximate static optimization oracles into our algorithm, and bound from above the impact of static optimization errors on the regret of our algorithm. We perform numerical studies to demonstrate the effectiveness of our proposed algorithm. At last, we extend our study by incorporating inventory carryover and the learning of customer arrival distribution.
