Survey of Data-driven Newsvendor: Unified Analysis and Spectrum of Achievable Regrets
Zhuoxin Chen, Will Ma
TL;DR
This work provides a unified, quantitative analysis of data-driven Newsvendor decisions when the demand distribution is unknown. By introducing the $(\beta,\gamma,\zeta)$-clustered distribution concept, it characterizes how closely the empirical decision tracks the optimal one and derives regret bounds that span the full spectrum from $O(n^{-1/2})$ to $O(n^{-1})$ for high-probability and expectation benchmarks. It also establishes tight additive lower bounds via a Hellinger-distance-based construction, showing that no algorithm can surpass the identified rates across the spectrum. Simulations on common demand distributions validate the theory and reveal crossover behavior driven by sample size and distribution-local clustering. The results offer a complete, data-size-aware picture of learnability for Newsvendor decisions and guide practitioners on when SAA suffices or when robust alternatives are warranted.
Abstract
In the Newsvendor problem, the goal is to guess the number that will be drawn from some distribution, with asymmetric consequences for guessing too high vs. too low. In the data-driven version, the distribution is unknown, and one must work with samples from the distribution. Data-driven Newsvendor has been studied under many variants: additive vs. multiplicative regret, high probability vs. expectation bounds, and different distribution classes. This paper studies all combinations of these variants, filling in many gaps in the literature and simplifying many proofs. In particular, we provide a unified analysis based on the notion of clustered distributions, which in conjunction with our new lower bounds, shows that the entire spectrum of regrets between $1/\sqrt{n}$ and $1/n$ can be possible. Simulations on commonly-used distributions demonstrate that our notion is the "correct" predictor of empirical regret across varying data sizes.
