Your Causal Self-Attentive Recommender Hosts a Lonely Neighborhood
Yueqi Wang, Zhankui He, Zhenrui Yue, Julian McAuley, Dong Wang
TL;DR
This work addresses the ambiguous performance trade-offs between auto-encoding (AE) and auto-regressive (AR) self-attention in sequential recommendation. It introduces two theoretically grounded metrics—sparsity of the attention matrix and a rank-$k$ low-rank approximation—and a modular experimental framework (ModSAR) to study AE vs AR across vanilla, variant, and HuggingFace models. The findings show AR attention exhibits a sparse local neighborhood bias and stores richer data dynamics, requiring higher-rank representations, and empirically AR outperforms AE across five diverse datasets and design spaces, including NLP-model integrations. The paper argues for adopting AR as the more robust starting point for future self-attentive recommender designs and provides open-source tooling to accelerate research and design space exploration.
Abstract
In the context of sequential recommendation, a pivotal issue pertains to the comparative analysis between bi-directional/auto-encoding (AE) and uni-directional/auto-regressive (AR) attention mechanisms, where the conclusions regarding architectural and performance superiority remain inconclusive. Previous efforts in such comparisons primarily involve summarizing existing works to identify a consensus or conducting ablation studies on peripheral modeling techniques, such as choices of loss functions. However, far fewer efforts have been made in (1) theoretical and (2) extensive empirical analysis of the self-attention module, the very pivotal structure on which performance and designing insights should be anchored. In this work, we first provide a comprehensive theoretical analysis of AE/AR attention matrix in the aspect of (1) sparse local inductive bias, a.k.a neighborhood effects, and (2) low rank approximation. Analytical metrics reveal that the AR attention exhibits sparse neighborhood effects suitable for generally sparse recommendation scenarios. Secondly, to support our theoretical analysis, we conduct extensive empirical experiments on comparing AE/AR attention on five popular benchmarks with AR performing better overall. Empirical results reported are based on our experimental pipeline named Modularized Design Space for Self-Attentive Recommender (ModSAR), supporting adaptive hyperparameter tuning, modularized design space and HuggingFace plug-ins. We invite the recommendation community to utilize/contribute to ModSAR to (1) conduct more module/model-level examining beyond AE/AR comparison and (2) accelerate state-of-the-art model design. Lastly, we shed light on future design choices for performant self-attentive recommenders. We make our pipeline implementation and data available at https://github.com/yueqirex/SAR-Check.
