Table of Contents
Fetching ...

Amortized Inference for Correlated Discrete Choice Models via Equivariant Neural Networks

Easton Huch, Michael Keane

Abstract

Discrete choice models are fundamental tools in management science, economics, and marketing for understanding and predicting decision-making. Logit-based models are dominant in applied work, largely due to their convenient closed-form expressions for choice probabilities. However, these models entail restrictive assumptions on the stochastic utility component, constraining our ability to capture realistic and theoretically grounded choice behavior$-$most notably, substitution patterns. In this work, we propose an amortized inference approach using a neural network emulator to approximate choice probabilities for general error distributions, including those with correlated errors. Our proposal includes a specialized neural network architecture and accompanying training procedures designed to respect the invariance properties of discrete choice models. We provide group-theoretic foundations for the architecture, including a proof of universal approximation given a minimal set of invariant features. Once trained, the emulator enables rapid likelihood evaluation and gradient computation. We use Sobolev training, augmenting the likelihood loss with a gradient-matching penalty so that the emulator learns both choice probabilities and their derivatives. We show that emulator-based maximum likelihood estimators are consistent and asymptotically normal under mild approximation conditions, and we provide sandwich standard errors that remain valid even with imperfect likelihood approximation. Simulations show significant gains over the GHK simulator in accuracy and speed.

Amortized Inference for Correlated Discrete Choice Models via Equivariant Neural Networks

Abstract

Discrete choice models are fundamental tools in management science, economics, and marketing for understanding and predicting decision-making. Logit-based models are dominant in applied work, largely due to their convenient closed-form expressions for choice probabilities. However, these models entail restrictive assumptions on the stochastic utility component, constraining our ability to capture realistic and theoretically grounded choice behaviormost notably, substitution patterns. In this work, we propose an amortized inference approach using a neural network emulator to approximate choice probabilities for general error distributions, including those with correlated errors. Our proposal includes a specialized neural network architecture and accompanying training procedures designed to respect the invariance properties of discrete choice models. We provide group-theoretic foundations for the architecture, including a proof of universal approximation given a minimal set of invariant features. Once trained, the emulator enables rapid likelihood evaluation and gradient computation. We use Sobolev training, augmenting the likelihood loss with a gradient-matching penalty so that the emulator learns both choice probabilities and their derivatives. We show that emulator-based maximum likelihood estimators are consistent and asymptotically normal under mild approximation conditions, and we provide sandwich standard errors that remain valid even with imperfect likelihood approximation. Simulations show significant gains over the GHK simulator in accuracy and speed.

Paper Structure

This paper contains 38 sections, 25 theorems, 78 equations, 3 figures, 8 tables.

Key Result

Theorem 1

For each alternative $j \in \{1, \ldots, K\}$, there exists a closed, measure-zero set $\mathcal{B}_j \subset \mathcal{X}_K$ such that for any $(\boldsymbol{v}^*_1, \boldsymbol{\mathbf{\Sigma}}^*_1), (\boldsymbol{v}^*_2, \boldsymbol{\mathbf{\Sigma}}^*_2) \in \mathcal{X}_K \setminus \mathcal{B}_j$: where $S_{K-1}^{(j)}$ is the subgroup of permutations fixing alternative $j$.

Figures (3)

  • Figure 1: Architecture of the neural network emulator. For each alternative $j$, the per-alternative encoder processes diagonal features (relating $j$ to each other alternative), off-diagonal features (pairwise features among alternatives other than $j$), and alternative $j$'s own features. These are combined via a combining MLP to produce representation $\boldsymbol{z}_j$. Representations for all alternatives are stacked and processed through permutation-equivariant layers to produce the final logits.
  • Figure 2: Training and evaluation loss over $400{,}000$ training episodes for emulators with $K \in \{3, 5, 10\}$ alternatives. The training loss (blue) decreases consistently throughout training, and the evaluation loss (orange) closely tracks the training loss, indicating that the emulators do not overfit the simulated training data.
  • Figure 3: (a) Training and evaluation loss over $400{,}000$ training episodes for emulators with $K \in \{3, 4, 5\}$ alternatives. The training loss (blue) decreases consistently throughout training, and the evaluation loss (orange) closely tracks the training loss, indicating minimal overfitting. (b) Emulator probability estimates compared to estimation errors. The emulator simultaneously learns the choice probabilities for $K = 3,\,4,\,5$ with small error.

Theorems & Definitions (68)

  • Theorem 1: Generic Separation
  • Theorem 2: MLP Universal Approximation
  • proof : Proof sketch
  • Corollary 1: Architecture Universal Approximation
  • proof
  • Theorem 3: Consistency
  • proof
  • Theorem 4: Asymptotic Normality
  • Remark 1: Approximate Maximizers
  • Proposition 1: Consistent Estimation of Fisher Information
  • ...and 58 more