Near-Optimal Real-Time Personalization with Simple Transformers

Lin An; Andrew A. Li; Vaisnavi Nemala; Gabriel Visotsky

Near-Optimal Real-Time Personalization with Simple Transformers

Lin An, Andrew A. Li, Vaisnavi Nemala, Gabriel Visotsky

TL;DR

The paper tackles real-time personalization by restricting transformers to a single self-attention layer (simple transformers) to enable efficient optimization. It proves that simple transformers can capture complex set effects such as sequential variety and complementarity/substitution, then designs a two-phase retrieval-and-ranking algorithm that achieves near-optimal performance with sublinear dependence on the catalog size under low non-negative rank assumptions. The approach is validated on Spotify and Trivago data, showing substantial accuracy gains over non-transformer baselines and competitive performance relative to deeper transformers, while offering faster real-time optimization than standard methods like kNN and Beam Search. This work provides a principled, scalable pathway to deploying transformer-based personalization in large-scale, latency-constrained environments.

Abstract

Real-time personalization has advanced significantly in recent years, with platforms utilizing machine learning models to predict user preferences based on rich behavioral data on each individual user. Traditional approaches usually rely on embedding-based machine learning models to capture user preferences, and then reduce the final optimization task to nearest-neighbors, which can be performed extremely fast. However, these models struggle to capture complex user behaviors, which are essential for making accurate recommendations. Transformer-based models, on the other hand, are known for their practical ability to model sequential behaviors, and hence have been intensively used in personalization recently to overcome these limitations. However, optimizing recommendations under transformer-based models is challenging due to their complicated architectures. In this paper, we address this challenge by considering a specific class of transformers, showing its ability to represent complex user preferences, and developing efficient algorithms for real-time personalization. We focus on a particular set of transformers, called simple transformers, which contain a single self-attention layer. We show that simple transformers are capable of capturing complex user preferences. We then develop an algorithm that enables fast optimization of recommendation tasks based on simple transformers. Our algorithm achieves near-optimal performance in sub-linear time. Finally, we demonstrate the effectiveness of our approach through an empirical study on datasets from Spotify and Trivago. Our experiment results show that (1) simple transformers can model/predict user preferences substantially more accurately than non-transformer models and nearly as accurately as more complex transformers, and (2) our algorithm completes simple-transformer-based recommendation tasks quickly and effectively.

Near-Optimal Real-Time Personalization with Simple Transformers

TL;DR

Abstract

Near-Optimal Real-Time Personalization with Simple Transformers

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (32)