A Nonparametric Approach with Marginals for Modeling Consumer Choice
Yanqiu Ruan, Xiaobo Li, Karthyek Murthy, Karthik Natarajan
TL;DR
This work introduces the Marginal Distribution Model (MDM) as a nonparametric, marginals-only approach to discrete choice, yielding tractable representation and prediction for data across offer sets. It provides a complete necessary-and-sufficient characterization (via a polynomial-size linear program) for when observed data are MDM-representable, proves MDM has positive Lebesgue measure, and clarifies that MDM and RUM do not subsume one another in general. The authors develop robust optimization frameworks to predict sales and revenues for unseen assortments and derive mixed-integer convex programs to handle data inconsistency, along with polynomial-time schemes for structured assortment collections. Empirical results on real (JD.com) and synthetic data show MDM achieves competitive predictive and explanatory performance with substantially faster computation than RUM-based methods, highlighting its practical value for pricing and assortment optimization under limited data. The paper also discusses limitations related to sampling noise and utility correlations, and provides extensive proofs and supplementary materials in the electronic companion.
Abstract
Given data on the choices made by consumers for different offer sets, a key challenge is to develop parsimonious models that describe and predict consumer choice behavior while being amenable to prescriptive tasks such as pricing and assortment optimization. The marginal distribution model (MDM) is one such model, which requires only the specification of marginal distributions of the random utilities. This paper aims to establish necessary and sufficient conditions for given choice data to be consistent with the MDM hypothesis, inspired by the usefulness of similar characterizations for the random utility model (RUM). This endeavor leads to an exact characterization of the set of choice probabilities that the MDM can represent. Verifying the consistency of choice data with this characterization is equivalent to solving a polynomial-sized linear program. Since the analogous verification task for RUM is computationally intractable and neither of these models subsumes the other, MDM is helpful in striking a balance between tractability and representational power. The characterization is then used with robust optimization for making data-driven sales and revenue predictions for new unseen assortments. When the choice data lacks consistency with the MDM hypothesis, finding the best-fitting MDM choice probabilities reduces to solving a mixed integer convex program. Numerical results using real world data and synthetic data demonstrate that MDM exhibits competitive representational power and prediction performance compared to RUM and parametric models while being significantly faster in computation than RUM.
