Table of Contents
Fetching ...

Near-Optimal Real-Time Personalization with Simple Transformers

Lin An, Andrew A. Li, Vaisnavi Nemala, Gabriel Visotsky

TL;DR

The paper tackles real-time personalization by restricting transformers to a single self-attention layer (simple transformers) to enable efficient optimization. It proves that simple transformers can capture complex set effects such as sequential variety and complementarity/substitution, then designs a two-phase retrieval-and-ranking algorithm that achieves near-optimal performance with sublinear dependence on the catalog size under low non-negative rank assumptions. The approach is validated on Spotify and Trivago data, showing substantial accuracy gains over non-transformer baselines and competitive performance relative to deeper transformers, while offering faster real-time optimization than standard methods like kNN and Beam Search. This work provides a principled, scalable pathway to deploying transformer-based personalization in large-scale, latency-constrained environments.

Abstract

Real-time personalization has advanced significantly in recent years, with platforms utilizing machine learning models to predict user preferences based on rich behavioral data on each individual user. Traditional approaches usually rely on embedding-based machine learning models to capture user preferences, and then reduce the final optimization task to nearest-neighbors, which can be performed extremely fast. However, these models struggle to capture complex user behaviors, which are essential for making accurate recommendations. Transformer-based models, on the other hand, are known for their practical ability to model sequential behaviors, and hence have been intensively used in personalization recently to overcome these limitations. However, optimizing recommendations under transformer-based models is challenging due to their complicated architectures. In this paper, we address this challenge by considering a specific class of transformers, showing its ability to represent complex user preferences, and developing efficient algorithms for real-time personalization. We focus on a particular set of transformers, called simple transformers, which contain a single self-attention layer. We show that simple transformers are capable of capturing complex user preferences. We then develop an algorithm that enables fast optimization of recommendation tasks based on simple transformers. Our algorithm achieves near-optimal performance in sub-linear time. Finally, we demonstrate the effectiveness of our approach through an empirical study on datasets from Spotify and Trivago. Our experiment results show that (1) simple transformers can model/predict user preferences substantially more accurately than non-transformer models and nearly as accurately as more complex transformers, and (2) our algorithm completes simple-transformer-based recommendation tasks quickly and effectively.

Near-Optimal Real-Time Personalization with Simple Transformers

TL;DR

The paper tackles real-time personalization by restricting transformers to a single self-attention layer (simple transformers) to enable efficient optimization. It proves that simple transformers can capture complex set effects such as sequential variety and complementarity/substitution, then designs a two-phase retrieval-and-ranking algorithm that achieves near-optimal performance with sublinear dependence on the catalog size under low non-negative rank assumptions. The approach is validated on Spotify and Trivago data, showing substantial accuracy gains over non-transformer baselines and competitive performance relative to deeper transformers, while offering faster real-time optimization than standard methods like kNN and Beam Search. This work provides a principled, scalable pathway to deploying transformer-based personalization in large-scale, latency-constrained environments.

Abstract

Real-time personalization has advanced significantly in recent years, with platforms utilizing machine learning models to predict user preferences based on rich behavioral data on each individual user. Traditional approaches usually rely on embedding-based machine learning models to capture user preferences, and then reduce the final optimization task to nearest-neighbors, which can be performed extremely fast. However, these models struggle to capture complex user behaviors, which are essential for making accurate recommendations. Transformer-based models, on the other hand, are known for their practical ability to model sequential behaviors, and hence have been intensively used in personalization recently to overcome these limitations. However, optimizing recommendations under transformer-based models is challenging due to their complicated architectures. In this paper, we address this challenge by considering a specific class of transformers, showing its ability to represent complex user preferences, and developing efficient algorithms for real-time personalization. We focus on a particular set of transformers, called simple transformers, which contain a single self-attention layer. We show that simple transformers are capable of capturing complex user preferences. We then develop an algorithm that enables fast optimization of recommendation tasks based on simple transformers. Our algorithm achieves near-optimal performance in sub-linear time. Finally, we demonstrate the effectiveness of our approach through an empirical study on datasets from Spotify and Trivago. Our experiment results show that (1) simple transformers can model/predict user preferences substantially more accurately than non-transformer models and nearly as accurately as more complex transformers, and (2) our algorithm completes simple-transformer-based recommendation tasks quickly and effectively.

Paper Structure

This paper contains 68 sections, 28 theorems, 199 equations, 4 figures, 2 tables, 9 algorithms.

Key Result

Theorem 1

Under additional (rank) assumptions on the simple transformer, given any $n,k\in\mathbb{N}$ and $\epsilon>0$, there exists an algorithm that achieves $\textup{ALG}\geq(1-\epsilon)\textup{OPT}$ with expected amortized runtime for functions $c,\mu$ satisfying $c(\epsilon,k),\mu(\epsilon)>0$. Here $\tilde{O}$ hides factors of order $n^{o(1)}$.

Figures (4)

  • Figure 1: Architecture of the simple transformer used in the Spotify experiment.
  • Figure 2: Architecture of the transformer used in the Trivago experiment. The simple transformer only contained the decoder, that is, a single self-attention layer.
  • Figure 3: Performances of four algorithms. The $x$-axis is the number of candidate solutions generated by each algorithm, and the $y$-axis is the objective value of the current best candidate solution. Each figure is averaged across 100 instances.
  • Figure 4: Scatter plots of candidate solutions. Each point corresponds to a pair of matched candidate solutions produced by our algorithm and Beam Search. The $x$-axis represents our algorithm's objective value and the $y$-axis represents the Beam Search candidate solution's objective value. Each plot contains 2500 data points given by 25 candidate solutions in each of the 100 instances.

Theorems & Definitions (32)

  • Theorem 1: Informal
  • Proposition 1: Corollary 1 in andoni2015practical
  • Proposition 2: Theorem 1 in ailon2009fast
  • Proposition 3: Theorem 3.1 in arya1998optimal
  • Definition 1: $\epsilon$-Approximate $k$-Nearest Neighbor Algorithm
  • Lemma 1
  • Proposition 4
  • Definition 2: Simple Transformer
  • Proposition 5
  • Definition 3: Simple-Transformer Based Recommendations
  • ...and 22 more