Modeling Choice via Self-Attention
Joohwan Ko, Andrew A. Li
TL;DR
The paper introduces a self-attention-based choice model, the Low-Rank Halo MNL, which generalizes Halo-MNL through a low-rank decomposition $H = \mathrm{diag}(\alpha) + UV^\top$ and is implementable as a simple network with a single self-attention head. It proves a theoretical improvement in sample complexity from $Ω(m^2)$ for Halo-MNL to $O(rm)$ for the LR-Halo MNL under a nonconvex, regularized estimation framework, with a local optimum close to the true parameters when $n = Ω(rm \log m)$. The authors establish a large-scale benchmark by expanding the IRI Academic Dataset to cover an entire year, 31 categories, and up to 20 products, and conduct extensive evaluations against traditional and DL-based choice models. Empirically, the LR-Halo MNL achieves superior predictive performance, particularly in long-term (52-week) analyses, demonstrating the practical relevance and robustness of the approach for assortment, inventory, and price optimization. Overall, the work provides both strong theoretical guarantees and compelling empirical evidence that a self-attention–driven, low-rank choice model can outperform existing methods on realistic, large-scale data.
Abstract
Models of choice are a fundamental input to many now-canonical optimization problems in the field of Operations Management, including assortment, inventory, and price optimization. Naturally, accurate estimation of these models from data is a critical step in the application of these optimization problems in practice. Concurrently, recent advancements in deep learning have sparked interest in integrating these techniques into choice modeling. However, there is a noticeable research gap at the intersection of deep learning and choice modeling, particularly with both theoretical and empirical foundations. Thus motivated, we first propose a choice model that is the first to successfully (both theoretically and practically) leverage a modern neural network architectural concept (self-attention). Theoretically, we show that our attention-based choice model is a low-rank generalization of the Halo Multinomial Logit (Halo-MNL) model. We prove that whereas the Halo-MNL requires $Ω(m^2)$ data samples to estimate, where $m$ is the number of products, our model supports a natural nonconvex estimator (in particular, that which a standard neural network implementation would apply) which admits a near-optimal stationary point with $O(m)$ samples. Additionally, we establish the first realistic-scale benchmark for choice model estimation on real data, conducting the most extensive evaluation of existing models to date, thereby highlighting our model's superior performance.
