Table of Contents
Fetching ...

Bayesian Deep Learning for Discrete Choice

Daniel F. Villarraga, Ricardo A. Daziano

TL;DR

The paper introduces a Bayesian deep learning architecture for discrete choice that combines a knowledge-informed linear component with nonlinear IIA and non-IIA blocks, and uses Stochastic Gradient Langevin Dynamics to sample from the posterior. A two-step learning procedure guides the model toward simple, behaviorally plausible hypotheses when data are scarce while enabling complex patterns as data grow, and it enables credible interval construction for economic quantities like marginal utilities and VOTT. Through a Monte Carlo study and two case studies (NYC mode choice and Swiss train data), the approach achieves empirical coverage near 95% for MRS, competitive out-of-sample predictive accuracy, and economically sensible inferences, outperforming fully connected networks and baseline conditional logit in several settings. The work demonstrates that Bayesian deep learning can deliver both predictive accuracy and reliable, interpretable inference in discrete choice contexts, addressing key concerns about uncertainty representation and interpretability in DL applications.

Abstract

Discrete choice models (DCMs) are used to analyze individual decision-making in contexts such as transportation choices, political elections, and consumer preferences. DCMs play a central role in applied econometrics by enabling inference on key economic variables, such as marginal rates of substitution, rather than focusing solely on predicting choices on new unlabeled data. However, while traditional DCMs offer high interpretability and support for point and interval estimation of economic quantities, these models often underperform in predictive tasks compared to deep learning (DL) models. Despite their predictive advantages, DL models remain largely underutilized in discrete choice due to concerns about their lack of interpretability, unstable parameter estimates, and the absence of established methods for uncertainty quantification. Here, we introduce a deep learning model architecture specifically designed to integrate with approximate Bayesian inference methods, such as Stochastic Gradient Langevin Dynamics (SGLD). Our proposed model collapses to behaviorally informed hypotheses when data is limited, mitigating overfitting and instability in underspecified settings while retaining the flexibility to capture complex nonlinear relationships when sufficient data is available. We demonstrate our approach using SGLD through a Monte Carlo simulation study, evaluating both predictive metrics--such as out-of-sample balanced accuracy--and inferential metrics--such as empirical coverage for marginal rates of substitution interval estimates. Additionally, we present results from two empirical case studies: one using revealed mode choice data in NYC, and the other based on the widely used Swiss train choice stated preference data.

Bayesian Deep Learning for Discrete Choice

TL;DR

The paper introduces a Bayesian deep learning architecture for discrete choice that combines a knowledge-informed linear component with nonlinear IIA and non-IIA blocks, and uses Stochastic Gradient Langevin Dynamics to sample from the posterior. A two-step learning procedure guides the model toward simple, behaviorally plausible hypotheses when data are scarce while enabling complex patterns as data grow, and it enables credible interval construction for economic quantities like marginal utilities and VOTT. Through a Monte Carlo study and two case studies (NYC mode choice and Swiss train data), the approach achieves empirical coverage near 95% for MRS, competitive out-of-sample predictive accuracy, and economically sensible inferences, outperforming fully connected networks and baseline conditional logit in several settings. The work demonstrates that Bayesian deep learning can deliver both predictive accuracy and reliable, interpretable inference in discrete choice contexts, addressing key concerns about uncertainty representation and interpretability in DL applications.

Abstract

Discrete choice models (DCMs) are used to analyze individual decision-making in contexts such as transportation choices, political elections, and consumer preferences. DCMs play a central role in applied econometrics by enabling inference on key economic variables, such as marginal rates of substitution, rather than focusing solely on predicting choices on new unlabeled data. However, while traditional DCMs offer high interpretability and support for point and interval estimation of economic quantities, these models often underperform in predictive tasks compared to deep learning (DL) models. Despite their predictive advantages, DL models remain largely underutilized in discrete choice due to concerns about their lack of interpretability, unstable parameter estimates, and the absence of established methods for uncertainty quantification. Here, we introduce a deep learning model architecture specifically designed to integrate with approximate Bayesian inference methods, such as Stochastic Gradient Langevin Dynamics (SGLD). Our proposed model collapses to behaviorally informed hypotheses when data is limited, mitigating overfitting and instability in underspecified settings while retaining the flexibility to capture complex nonlinear relationships when sufficient data is available. We demonstrate our approach using SGLD through a Monte Carlo simulation study, evaluating both predictive metrics--such as out-of-sample balanced accuracy--and inferential metrics--such as empirical coverage for marginal rates of substitution interval estimates. Additionally, we present results from two empirical case studies: one using revealed mode choice data in NYC, and the other based on the widely used Swiss train choice stated preference data.

Paper Structure

This paper contains 21 sections, 20 equations, 17 figures, 5 tables, 1 algorithm.

Figures (17)

  • Figure 1: Conditional logistic regression as a shallow neural network.
  • Figure 2: Binary Skip-GNN model architecture2025arXiv250309786V.
  • Figure 3: Multinomial Skip-GNN with IIA model architecture 2025arXiv250309786V.
  • Figure 4: Representation of uncertainty in the prediction $f(\bm{x};\bm{\Theta})$. The prediction bands represent $(1-\alpha)$ credible intervals. The solid red line depicts the true data-generating process, while observations are represented by blue dots. The highlighted region shows an interval in the input space with high epistemic uncertainty.
  • Figure 5: Proposed deep learning model architecture.
  • ...and 12 more figures