Revisiting Self-Attentive Sequential Recommendation
Zan Huang
TL;DR
This work revisits self-attentive sequential recommendation by examining SASRec’s limitations in personalization and embedding usage, and by proposing a focused program of experiments to scale the approach. It highlights issues such as absence of explicit user embeddings, padding and positional embedding misuses, and data-fitting biases from offline training, then outlines corrections spanning embedding handling, autoregressive evaluation, tokenization, embedding duality, and sampling. The key contributions are the detailed diagnostics of SASRec’s components and a concrete experimental roadmap aimed at improving personalization, robustness, and scalability for hyperscale recommender systems. The findings have practical significance for deploying efficient, accurate sequential recommenders in industry, guiding design choices for next-generation systems.
Abstract
Recommender systems are ubiquitous in on-line services to drive businesses. And many sequential recommender models were deployed in these systems to enhance personalization. The approach of using the transformer decoder as the sequential recommender was proposed years ago and is still a strong baseline in recent works. But this kind of sequential recommender model did not scale up well, compared to language models. Quite some details in the classical self-attentive sequential recommender model could be revisited, and some new experiments may lead to new findings, without changing the general model structure which was the focus of many previous works. In this paper, we show the details and propose new experiment methodologies for future research on sequential recommendation, in hope to motivate further exploration to new findings in this area.
