Modeling Choice via Self-Attention

Joohwan Ko; Andrew A. Li

Modeling Choice via Self-Attention

Joohwan Ko, Andrew A. Li

TL;DR

The paper introduces a self-attention-based choice model, the Low-Rank Halo MNL, which generalizes Halo-MNL through a low-rank decomposition $H = \mathrm{diag}(\alpha) + UV^\top$ and is implementable as a simple network with a single self-attention head. It proves a theoretical improvement in sample complexity from $Ω(m^2)$ for Halo-MNL to $O(rm)$ for the LR-Halo MNL under a nonconvex, regularized estimation framework, with a local optimum close to the true parameters when $n = Ω(rm \log m)$. The authors establish a large-scale benchmark by expanding the IRI Academic Dataset to cover an entire year, 31 categories, and up to 20 products, and conduct extensive evaluations against traditional and DL-based choice models. Empirically, the LR-Halo MNL achieves superior predictive performance, particularly in long-term (52-week) analyses, demonstrating the practical relevance and robustness of the approach for assortment, inventory, and price optimization. Overall, the work provides both strong theoretical guarantees and compelling empirical evidence that a self-attention–driven, low-rank choice model can outperform existing methods on realistic, large-scale data.

Abstract

Models of choice are a fundamental input to many now-canonical optimization problems in the field of Operations Management, including assortment, inventory, and price optimization. Naturally, accurate estimation of these models from data is a critical step in the application of these optimization problems in practice. Concurrently, recent advancements in deep learning have sparked interest in integrating these techniques into choice modeling. However, there is a noticeable research gap at the intersection of deep learning and choice modeling, particularly with both theoretical and empirical foundations. Thus motivated, we first propose a choice model that is the first to successfully (both theoretically and practically) leverage a modern neural network architectural concept (self-attention). Theoretically, we show that our attention-based choice model is a low-rank generalization of the Halo Multinomial Logit (Halo-MNL) model. We prove that whereas the Halo-MNL requires $Ω(m^2)$ data samples to estimate, where $m$ is the number of products, our model supports a natural nonconvex estimator (in particular, that which a standard neural network implementation would apply) which admits a near-optimal stationary point with $O(m)$ samples. Additionally, we establish the first realistic-scale benchmark for choice model estimation on real data, conducting the most extensive evaluation of existing models to date, thereby highlighting our model's superior performance.

Modeling Choice via Self-Attention

TL;DR

The paper introduces a self-attention-based choice model, the Low-Rank Halo MNL, which generalizes Halo-MNL through a low-rank decomposition

and is implementable as a simple network with a single self-attention head. It proves a theoretical improvement in sample complexity from

for Halo-MNL to

for the LR-Halo MNL under a nonconvex, regularized estimation framework, with a local optimum close to the true parameters when

. The authors establish a large-scale benchmark by expanding the IRI Academic Dataset to cover an entire year, 31 categories, and up to 20 products, and conduct extensive evaluations against traditional and DL-based choice models. Empirically, the LR-Halo MNL achieves superior predictive performance, particularly in long-term (52-week) analyses, demonstrating the practical relevance and robustness of the approach for assortment, inventory, and price optimization. Overall, the work provides both strong theoretical guarantees and compelling empirical evidence that a self-attention–driven, low-rank choice model can outperform existing methods on realistic, large-scale data.

Abstract

data samples to estimate, where

is the number of products, our model supports a natural nonconvex estimator (in particular, that which a standard neural network implementation would apply) which admits a near-optimal stationary point with

samples. Additionally, we establish the first realistic-scale benchmark for choice model estimation on real data, conducting the most extensive evaluation of existing models to date, thereby highlighting our model's superior performance.

Paper Structure (22 sections, 1 theorem, 24 equations, 2 figures, 6 tables, 1 algorithm)

This paper contains 22 sections, 1 theorem, 24 equations, 2 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Relation to Halo-MNL
Relation to Attention-Based Approaches:
Choice Models as Neural Networks
Our Model Through Two Lenses
Estimation and Theoretical Guarantees
Experiment
Dataset Description
Hotel Dataset
IRI Academic Dataset
Benchmark Models
Experiment Details
Results
Conclusion
...and 7 more sections

Key Result

Proposition 1

The Low-Rank Halo MNL choice model can be represented as a neural network with a single self-attention head.

Figures (2)

Figure 1: Detailed breakdown of experimental results on 4 weeks of data. Cross-entropy loss is reported across product categories, normalized to our model (Low-Rank Halo MNL).
Figure 2: Detailed breakdown of experimental results on 52 weeks of data. Cross-entropy loss is reported across product categories, normalized to our model (Low-Rank Halo MNL).

Theorems & Definitions (2)

Proposition 1
proof

Modeling Choice via Self-Attention

TL;DR

Abstract

Modeling Choice via Self-Attention

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (2)