Bayesian Deep Learning for Discrete Choice
Daniel F. Villarraga, Ricardo A. Daziano
TL;DR
The paper introduces a Bayesian deep learning architecture for discrete choice that combines a knowledge-informed linear component with nonlinear IIA and non-IIA blocks, and uses Stochastic Gradient Langevin Dynamics to sample from the posterior. A two-step learning procedure guides the model toward simple, behaviorally plausible hypotheses when data are scarce while enabling complex patterns as data grow, and it enables credible interval construction for economic quantities like marginal utilities and VOTT. Through a Monte Carlo study and two case studies (NYC mode choice and Swiss train data), the approach achieves empirical coverage near 95% for MRS, competitive out-of-sample predictive accuracy, and economically sensible inferences, outperforming fully connected networks and baseline conditional logit in several settings. The work demonstrates that Bayesian deep learning can deliver both predictive accuracy and reliable, interpretable inference in discrete choice contexts, addressing key concerns about uncertainty representation and interpretability in DL applications.
Abstract
Discrete choice models (DCMs) are used to analyze individual decision-making in contexts such as transportation choices, political elections, and consumer preferences. DCMs play a central role in applied econometrics by enabling inference on key economic variables, such as marginal rates of substitution, rather than focusing solely on predicting choices on new unlabeled data. However, while traditional DCMs offer high interpretability and support for point and interval estimation of economic quantities, these models often underperform in predictive tasks compared to deep learning (DL) models. Despite their predictive advantages, DL models remain largely underutilized in discrete choice due to concerns about their lack of interpretability, unstable parameter estimates, and the absence of established methods for uncertainty quantification. Here, we introduce a deep learning model architecture specifically designed to integrate with approximate Bayesian inference methods, such as Stochastic Gradient Langevin Dynamics (SGLD). Our proposed model collapses to behaviorally informed hypotheses when data is limited, mitigating overfitting and instability in underspecified settings while retaining the flexibility to capture complex nonlinear relationships when sufficient data is available. We demonstrate our approach using SGLD through a Monte Carlo simulation study, evaluating both predictive metrics--such as out-of-sample balanced accuracy--and inferential metrics--such as empirical coverage for marginal rates of substitution interval estimates. Additionally, we present results from two empirical case studies: one using revealed mode choice data in NYC, and the other based on the widely used Swiss train choice stated preference data.
