Calibrating an Imperfect Auxiliary Predictor for Unobserved No-Purchase Choice
Jiangkai Xiong, Kalyan Talluri, Hanzhao Wang
TL;DR
This paper addresses the difficulty of estimating unobserved no-purchase probabilities when only purchase data are observed. It introduces two calibration strategies for a biased outside-option predictor: an affine logit-space regression (linear calibration) and a nearly-monotone maximum rank correlation (MRC) approach, each with finite-sample guarantees that separate predictor quality from utility-learning error. A key structural identity expresses the outside-option log-odds as the difference between outside utility and the inclusive value of offered products, enabling calibration without observing no-purchase events. The framework supports multiple predictors and robust aggregation, and the experiments (synthetic and Expedia real data) show substantial improvements in outside-option estimation and downstream assortment revenue, particularly when predictor bias is nonlinear. The work provides practical tools for plug-in calibration in settings with censored outside-option data, with clear implications for market-sizing and decision quality in retail and online platforms.
Abstract
Firms typically cannot observe key consumer actions: whether customers buy from a competitor, choose not to buy, or even fully consider the firm's offer. This missing outside-option information makes market-size and preference estimation difficult even in simple multinomial logit (MNL) models, and it is a central obstacle in practice when only transaction data are recorded. Existing approaches often rely on auxiliary market-share, aggregated, or cross-market data. We study a complementary setting in which a black-box auxiliary predictor provides outside-option probabilities, but is potentially biased or miscalibrated because it was trained in a different channel, period, or population, or produced by an external machine-learning system. We develop calibration methods that turn such imperfect predictions into statistically valid no-purchase estimates using purchase-only data from the focal environment. First, under affine miscalibration in logit space, we show that a simple regression identifies outside-option utility parameters and yields consistent recovery of no-purchase probabilities without collecting new labels for no-purchase events. Second, under a weaker nearly monotone condition, we propose a rank-based calibration method and derive finite-sample error bounds that cleanly separate auxiliary-predictor quality from first-stage utility-learning error over observed in-set choices. Our analysis also translates estimation error into downstream decision quality for assortment optimization, quantifying how calibration accuracy affects revenue performance. The bounds provide explicit dependence on predictor alignment and utility-learning error, clarifying when each source dominates. Numerical experiments demonstrate improvements in no-purchase estimation and downstream assortment decisions, and we discuss robust aggregation extensions for combining multiple auxiliary predictors.
