Table of Contents
Fetching ...

Self-Attentive Sequential Recommendation

Wang-Cheng Kang, Julian McAuley

TL;DR

The paper addresses sequential item recommendation by proposing SASRec, a self-attention-based model that captures long-range user preferences while maintaining MC-like efficiency through attention over past actions. Built from embedding, causally masked self-attention blocks, and a predictive MF-style layer, SASRec + residuals, layernorm, and dropout achieves state-of-the-art results on both sparse and dense datasets and is significantly faster than CNN/RNN-based rivals. Comprehensive ablations, complexity analysis, and attention visualizations demonstrate the importance of positional encoding, depth, regularization, and adaptive attention, validating SASRec's effectiveness and scalability. The findings suggest self-attention offers a principled, parallelizable approach to sequential recommendation that generalizes several classic models and adapts to data density in practical settings.

Abstract

Sequential dynamics are a key feature of many modern recommender systems, which seek to capture the `context' of users' activities on the basis of actions they have performed recently. To capture such patterns, two approaches have proliferated: Markov Chains (MCs) and Recurrent Neural Networks (RNNs). Markov Chains assume that a user's next action can be predicted on the basis of just their last (or last few) actions, while RNNs in principle allow for longer-term semantics to be uncovered. Generally speaking, MC-based methods perform best in extremely sparse datasets, where model parsimony is critical, while RNNs perform better in denser datasets where higher model complexity is affordable. The goal of our work is to balance these two goals, by proposing a self-attention based sequential model (SASRec) that allows us to capture long-term semantics (like an RNN), but, using an attention mechanism, makes its predictions based on relatively few actions (like an MC). At each time step, SASRec seeks to identify which items are `relevant' from a user's action history, and use them to predict the next item. Extensive empirical studies show that our method outperforms various state-of-the-art sequential models (including MC/CNN/RNN-based approaches) on both sparse and dense datasets. Moreover, the model is an order of magnitude more efficient than comparable CNN/RNN-based models. Visualizations on attention weights also show how our model adaptively handles datasets with various density, and uncovers meaningful patterns in activity sequences.

Self-Attentive Sequential Recommendation

TL;DR

The paper addresses sequential item recommendation by proposing SASRec, a self-attention-based model that captures long-range user preferences while maintaining MC-like efficiency through attention over past actions. Built from embedding, causally masked self-attention blocks, and a predictive MF-style layer, SASRec + residuals, layernorm, and dropout achieves state-of-the-art results on both sparse and dense datasets and is significantly faster than CNN/RNN-based rivals. Comprehensive ablations, complexity analysis, and attention visualizations demonstrate the importance of positional encoding, depth, regularization, and adaptive attention, validating SASRec's effectiveness and scalability. The findings suggest self-attention offers a principled, parallelizable approach to sequential recommendation that generalizes several classic models and adapts to data density in practical settings.

Abstract

Sequential dynamics are a key feature of many modern recommender systems, which seek to capture the `context' of users' activities on the basis of actions they have performed recently. To capture such patterns, two approaches have proliferated: Markov Chains (MCs) and Recurrent Neural Networks (RNNs). Markov Chains assume that a user's next action can be predicted on the basis of just their last (or last few) actions, while RNNs in principle allow for longer-term semantics to be uncovered. Generally speaking, MC-based methods perform best in extremely sparse datasets, where model parsimony is critical, while RNNs perform better in denser datasets where higher model complexity is affordable. The goal of our work is to balance these two goals, by proposing a self-attention based sequential model (SASRec) that allows us to capture long-term semantics (like an RNN), but, using an attention mechanism, makes its predictions based on relatively few actions (like an MC). At each time step, SASRec seeks to identify which items are `relevant' from a user's action history, and use them to predict the next item. Extensive empirical studies show that our method outperforms various state-of-the-art sequential models (including MC/CNN/RNN-based approaches) on both sparse and dense datasets. Moreover, the model is an order of magnitude more efficient than comparable CNN/RNN-based models. Visualizations on attention weights also show how our model adaptively handles datasets with various density, and uncovers meaningful patterns in activity sequences.

Paper Structure

This paper contains 24 sections, 11 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: A simplified diagram showing the training process of SASRec. At each time step, the model considers all previous items, and uses attention to 'focus on' items relevant to the next action.
  • Figure 2: Effect of the latent dimensionality $d$ on ranking performance (NDCG@10).
  • Figure 3: Training efficiency on ML-1M. SASRec is an order of magnitude faster than CNN/RNN-based recommendation methods in terms of training time per epoch and in total.
  • Figure 4: Visualizations of average attention weights on positions at different time steps. For comparison, the heatmap of a first-order Markov chain based model would be a diagonal matrix.
  • Figure 5: Visualization of average attention between movies from four categories. This shows our model can uncover items' attributes, and assigns larger weights between similar items.