Conjugating Variational Inference for Large Mixed Multinomial Logit Models and Consumer Choice
Weiben Zhang, Ruben Loaiza-Maya, Michael Stanley Smith, Worapree Maneesoonthorn
TL;DR
The authors tackle Bayesian inference for large, high-dimensional mixed multinomial logit (MMNL) models by proposing Conjugating Variational Inference (CVI), which constructs a Gaussian approximation to the conditional posterior of the random coefficients using a second-order Taylor expansion and uses intermittently updated centers to maintain scalability. CVI significantly improves accuracy and speed relative to existing VI methods (DAVI and AVI) across standard MMNL, nested MMNL, and bundle MMNL specifications, including very large datasets with thousands of groups and millions of observations. The method is validated in simulations and applied to a large consumer-choice dataset on pasta purchases, revealing substantial store- and brand-level heterogeneity, meaningful price-elasticity patterns tied to store characteristics, and enhanced predictive performance when incorporating pasta-sauce bundles. The study demonstrates CVI’s practical value for scalable Bayesian inference in complex discrete choice models and motivates further extensions to richer data environments and skewed variational families.
Abstract
Heterogeneity in multinomial choice data is often accounted for using logit models with random coefficients. Such models are called "mixed", but they can be difficult to estimate for large datasets. We review current Bayesian variational inference (VI) methods that can do so, and propose a new VI method that scales more effectively. The key innovation is a step that updates efficiently a Gaussian approximation to the conditional posterior of the random coefficients, addressing a bottleneck within the variational optimization. The approach is used to estimate three types of mixed logit models: standard, nested and bundle variants. We first demonstrate the improvement of our new approach over existing VI methods using simulations. Our method is then applied to a large scanner panel dataset of pasta choice. We find consumer response to price and promotion variables exhibits substantial heterogeneity at the grocery store and product levels. Store size, premium and geography are found to be drivers of store level estimates of price elasticities. Extension to bundle choice with pasta sauce improves model accuracy further. Predictions from the mixed models are more accurate than those from fixed coefficients equivalents, and our VI method provides insights in circumstances which other methods find challenging.
