Table of Contents
Fetching ...

Conjugating Variational Inference for Large Mixed Multinomial Logit Models and Consumer Choice

Weiben Zhang, Ruben Loaiza-Maya, Michael Stanley Smith, Worapree Maneesoonthorn

TL;DR

The authors tackle Bayesian inference for large, high-dimensional mixed multinomial logit (MMNL) models by proposing Conjugating Variational Inference (CVI), which constructs a Gaussian approximation to the conditional posterior of the random coefficients using a second-order Taylor expansion and uses intermittently updated centers to maintain scalability. CVI significantly improves accuracy and speed relative to existing VI methods (DAVI and AVI) across standard MMNL, nested MMNL, and bundle MMNL specifications, including very large datasets with thousands of groups and millions of observations. The method is validated in simulations and applied to a large consumer-choice dataset on pasta purchases, revealing substantial store- and brand-level heterogeneity, meaningful price-elasticity patterns tied to store characteristics, and enhanced predictive performance when incorporating pasta-sauce bundles. The study demonstrates CVI’s practical value for scalable Bayesian inference in complex discrete choice models and motivates further extensions to richer data environments and skewed variational families.

Abstract

Heterogeneity in multinomial choice data is often accounted for using logit models with random coefficients. Such models are called "mixed", but they can be difficult to estimate for large datasets. We review current Bayesian variational inference (VI) methods that can do so, and propose a new VI method that scales more effectively. The key innovation is a step that updates efficiently a Gaussian approximation to the conditional posterior of the random coefficients, addressing a bottleneck within the variational optimization. The approach is used to estimate three types of mixed logit models: standard, nested and bundle variants. We first demonstrate the improvement of our new approach over existing VI methods using simulations. Our method is then applied to a large scanner panel dataset of pasta choice. We find consumer response to price and promotion variables exhibits substantial heterogeneity at the grocery store and product levels. Store size, premium and geography are found to be drivers of store level estimates of price elasticities. Extension to bundle choice with pasta sauce improves model accuracy further. Predictions from the mixed models are more accurate than those from fixed coefficients equivalents, and our VI method provides insights in circumstances which other methods find challenging.

Conjugating Variational Inference for Large Mixed Multinomial Logit Models and Consumer Choice

TL;DR

The authors tackle Bayesian inference for large, high-dimensional mixed multinomial logit (MMNL) models by proposing Conjugating Variational Inference (CVI), which constructs a Gaussian approximation to the conditional posterior of the random coefficients using a second-order Taylor expansion and uses intermittently updated centers to maintain scalability. CVI significantly improves accuracy and speed relative to existing VI methods (DAVI and AVI) across standard MMNL, nested MMNL, and bundle MMNL specifications, including very large datasets with thousands of groups and millions of observations. The method is validated in simulations and applied to a large consumer-choice dataset on pasta purchases, revealing substantial store- and brand-level heterogeneity, meaningful price-elasticity patterns tied to store characteristics, and enhanced predictive performance when incorporating pasta-sauce bundles. The study demonstrates CVI’s practical value for scalable Bayesian inference in complex discrete choice models and motivates further extensions to richer data environments and skewed variational families.

Abstract

Heterogeneity in multinomial choice data is often accounted for using logit models with random coefficients. Such models are called "mixed", but they can be difficult to estimate for large datasets. We review current Bayesian variational inference (VI) methods that can do so, and propose a new VI method that scales more effectively. The key innovation is a step that updates efficiently a Gaussian approximation to the conditional posterior of the random coefficients, addressing a bottleneck within the variational optimization. The approach is used to estimate three types of mixed logit models: standard, nested and bundle variants. We first demonstrate the improvement of our new approach over existing VI methods using simulations. Our method is then applied to a large scanner panel dataset of pasta choice. We find consumer response to price and promotion variables exhibits substantial heterogeneity at the grocery store and product levels. Store size, premium and geography are found to be drivers of store level estimates of price elasticities. Extension to bundle choice with pasta sauce improves model accuracy further. Predictions from the mixed models are more accurate than those from fixed coefficients equivalents, and our VI method provides insights in circumstances which other methods find challenging.
Paper Structure (25 sections, 47 equations, 6 figures, 18 tables, 5 algorithms)

This paper contains 25 sections, 47 equations, 6 figures, 18 tables, 5 algorithms.

Figures (6)

  • Figure 1: Posterior densities of random coefficients for a representative group in the small MMNL simulation. The exact posterior is shaded, while four variational posteriors are given as lines. Each row corresponds to random coefficients associated with a specific alternative, and each column corresponds to the random coefficients of a specific covariate. For identification purposes, the coefficients of the first (reference) alternative are fixed at $\text{\boldmath$0$}$.
  • Figure 2: Comparison of $\mathbb{F}$1 scores from four VI methods for the large simulation examples. Values are the differences in scores between each VI method and DAVI. The rows correspond to Simulations 1, 2 and 3. The first column shows scores for the training data, and the second column shows scores for the test data. Positive/negative values indicate higher/lower predictive accuracy than DAVI. Boxplots exclude outliers, defined as observations more than 1.5 × IQR from the box. Equivalent plots for the log-score are found in Figure \ref{['fig:robust_ls']} of the Online Appendix.
  • Figure 3: Own-price elasticities in for the most popular pasta alternative "Private Label Thin Spaghetti". They are computed for three representative stores using their store-specific random coefficients.
  • Figure C1: Approximation accuracy of the second-order Taylor expansion for a representative group from Simulation 1 at different optimization steps $iter$. The exact likelihood is evaluated at $\text{\boldmath$\alpha$}^{(iter)}$, while the approximated likelihood is constructed using a second-order Taylor expansion around $\text{\boldmath$a$}^{(iter)}$ and also evaluated at $\text{\boldmath$\alpha$}^{(iter)}$. The value of $\alpha^{(iter)}_{(3,2)}$ is varied along the grid to trace out the likelihood profiles, while all other coefficients remain fixed.
  • Figure C2: Posteriors of random coefficients of a representative group for the smaller simulated B-MMNL example: The figure displays the posterior distributions obtained using three VI methods and MCMC. Each row represents the random coefficients associated with a specific alternative, while each column corresponds to the random coefficients of a specific covariate across alternatives. For identification purposes, the coefficients of the first (reference) alternative are fixed at $0$.
  • ...and 1 more figures