Table of Contents
Fetching ...

The Data-Driven Censored Newsvendor Problem

Chamsi Hssaine, Sean R. Sinclair

TL;DR

This work introduces a data-driven censored newsvendor framework and uses distributionally robust optimization to quantify the impact of censored historical data on ordering decisions. A precise identifiability criterion, based on the observable boundary λ and the critical ratio ρ, yields a sharp dichotomy: if G^−(λ) ≥ ρ, vanishing minimax regret is achievable with q^Δ = q^*_G; otherwise, the problem is unidentifiable and the information loss Δ is strictly positive. The authors derive closed-form expressions for Δ and q^Δ, and propose Robust Censored Newsvendor (RCN), a two-stage algorithm that adapts to censoring level and achieves near-optimal regret with finite-sample guarantees across all regimes (Regret ≤ Δ + o(1/√N)) and provides matching lower bounds up to polylog factors. Extensive experiments on synthetic and real data confirm robust performance across censoring regimes and datasets. The framework offers practical guidance for inventory decisions under censored data and lays groundwork for extensions to contextual and multi-period settings.

Abstract

We study a censored variant of the data-driven newsvendor problem, where the decision-maker must select an ordering quantity that minimizes expected overage and underage costs based only on offline censored sales data, rather than historical demand realizations. Our goal is to understand how the degree of historical demand censoring affects the performance of any learning algorithm for this problem. To isolate this impact, we adopt a distributionally robust optimization framework, evaluating policies according to their worst-case regret over an ambiguity set of distributions. This set is defined by the largest historical order quantity (the observable boundary of the dataset), and contains all distributions matching the true demand distribution up to this boundary, while allowing them to be arbitrary afterwards. We demonstrate a spectrum of achievability under demand censoring by deriving a natural necessary and sufficient condition under which vanishing regret is an achievable goal. In regimes in which it is not, we exactly characterize the information loss due to censoring: an insurmountable lower bound on the performance of any policy, even when the decision-maker has access to infinitely many demand samples. We then leverage these sharp characterizations to propose a natural robust algorithm that adapts to the historical level of demand censoring. We derive finite-sample guarantees for this algorithm across all possible censoring regimes and show its near-optimality with matching lower bounds (up to polylogarithmic factors). We moreover demonstrate its robust performance via extensive numerical experiments on both synthetic and real-world datasets.

The Data-Driven Censored Newsvendor Problem

TL;DR

This work introduces a data-driven censored newsvendor framework and uses distributionally robust optimization to quantify the impact of censored historical data on ordering decisions. A precise identifiability criterion, based on the observable boundary λ and the critical ratio ρ, yields a sharp dichotomy: if G^−(λ) ≥ ρ, vanishing minimax regret is achievable with q^Δ = q^*_G; otherwise, the problem is unidentifiable and the information loss Δ is strictly positive. The authors derive closed-form expressions for Δ and q^Δ, and propose Robust Censored Newsvendor (RCN), a two-stage algorithm that adapts to censoring level and achieves near-optimal regret with finite-sample guarantees across all regimes (Regret ≤ Δ + o(1/√N)) and provides matching lower bounds up to polylog factors. Extensive experiments on synthetic and real data confirm robust performance across censoring regimes and datasets. The framework offers practical guidance for inventory decisions under censored data and lays groundwork for extensions to contextual and multi-period settings.

Abstract

We study a censored variant of the data-driven newsvendor problem, where the decision-maker must select an ordering quantity that minimizes expected overage and underage costs based only on offline censored sales data, rather than historical demand realizations. Our goal is to understand how the degree of historical demand censoring affects the performance of any learning algorithm for this problem. To isolate this impact, we adopt a distributionally robust optimization framework, evaluating policies according to their worst-case regret over an ambiguity set of distributions. This set is defined by the largest historical order quantity (the observable boundary of the dataset), and contains all distributions matching the true demand distribution up to this boundary, while allowing them to be arbitrary afterwards. We demonstrate a spectrum of achievability under demand censoring by deriving a natural necessary and sufficient condition under which vanishing regret is an achievable goal. In regimes in which it is not, we exactly characterize the information loss due to censoring: an insurmountable lower bound on the performance of any policy, even when the decision-maker has access to infinitely many demand samples. We then leverage these sharp characterizations to propose a natural robust algorithm that adapts to the historical level of demand censoring. We derive finite-sample guarantees for this algorithm across all possible censoring regimes and show its near-optimality with matching lower bounds (up to polylogarithmic factors). We moreover demonstrate its robust performance via extensive numerical experiments on both synthetic and real-world datasets.

Paper Structure

This paper contains 36 sections, 20 theorems, 89 equations, 3 figures, 1 algorithm.

Key Result

Proposition 1

The data-driven censored newsvendor problem is identifiable if and only if $\Delta = 0$.

Figures (3)

  • Figure 1: Illustration of the ambiguity set $\mathcal{F}(\lambda;G)$ induced by an observably boundary $\lambda$ and cdf $G$, represented by the black curve. Here, the seven colored curves are cdf's in $\mathcal{F}(\lambda;G)$: they coincide with $G(x)$ for all $x < \lambda$, and are arbitrary afterwards. Note that only the light and dark green curves are contained in $\mathcal{F}(\lambda';G)$, since all other curves deviate from $G(x)$ for some $x \in [\lambda, \lambda')$.
  • Figure 2: Dependence of $q^{\Delta}$ and $\Delta$ on $\lambda$ for $D \sim \text{Exponential}(1/80)$, $M = 200$, $h = 1$, and $\rho \in \{0.1,0.3,0.5,0.7,0.9\}$. We abuse notation and let $q^\star_{10\rho}$ denote the optimal newsvendor quantity associated with $\rho$. By \ref{['thm:minimax-risk-identifiable']}, $\lambda \geq q^\star_{10\rho}$ corresponds to the identifiable regime.
  • Figure 3: Illustration of a set of distributions in $\mathcal{B}(\lambda; G)$, parameterized by $p \in [0, 1 - G^-(\lambda)]$ and denoted by $F_p$. The true distribution $G$ is represented by the black curve. For any $p$, $F_p(x) = G(x)$ for all $x < \lambda$; it places mass $p$ at $x = \lambda$, and mass $1-p-G^-(\lambda)$ at $x = M$.

Theorems & Definitions (40)

  • Definition 1: Ambiguity set
  • Definition 2: Regret
  • Definition 3: Problem identifiability
  • Definition 4: Minimax risk
  • Proposition 1
  • Theorem 1
  • Corollary 1
  • Lemma 1
  • Lemma 2
  • Proposition 2
  • ...and 30 more