A Re-solving Heuristic for Dynamic Assortment Optimization with Knapsack Constraints
Xi Chen, Mo Liu, Yining Wang, Yuan Zhou
TL;DR
This work tackles dynamic assortment optimization under knapsack constraints with a multinomial-logit demand model. It introduces an epoch-based re-solving algorithm that reformulates the non-linear fluid objective into a sequence of linear programs via a denominator-to-constraint transformation and uses Birkhoff-von-Neumann decomposition-based sampling to realize feasible assortments. The authors prove that the cumulative regret scales logarithmically with the time horizon $T$ and problem parameters under standard assumptions, and they validate the approach with numerical experiments showing superior performance over baselines. The method offers a scalable, theoretically grounded approach for revenue management problems with inventory constraints and non-linear choice effects, with potential extensions to learning unknown models and more complex choice structures.
Abstract
In this paper, we consider a multi-stage dynamic assortment optimization problem with multi-nomial choice modeling (MNL) under resource knapsack constraints. Given the current resource inventory levels, the retailer makes an assortment decision at each period, and the goal of the retailer is to maximize the total profit from purchases. With the exact optimal dynamic assortment solution being computationally intractable, a practical strategy is to adopt the re-solving technique that periodically re-optimizes deterministic linear programs (LP) arising from fluid approximation. However, the fractional structure of MNL makes the fluid approximation in assortment optimization non-linear, which brings new technical challenges. To address this challenge, we propose a new epoch-based re-solving algorithm that effectively transforms the denominator of the objective into the constraint, so that the re-solving technique is applied to a linear program with additional slack variables amenable to practical computations and theoretical analysis. Theoretically, we prove that the regret (i.e., the gap between the resolving policy and the optimal objective of the fluid approximation) scales logarithmically with the length of time horizon and resource capacities.
