Gated Rotary-Enhanced Linear Attention for Long-term Sequential Recommendation
Juntao Hu, Wei Zhou, Huayi Shen, Xiao Du, Jie Liao, Min Gao, Jun Zeng, Junhao Wen
TL;DR
This work addresses the inefficiency of modeling long-term user sequences in sequential recommendation by marrying rotary position encoding with linear attention. It introduces RELA, which applies RoPE within a linear-attention framework, and GRELA, which adds a Local Shortcut and a SiLU-based gating mechanism to differentiate short-term bursts from genuine long-term shifts. Empirical results on five large benchmarks show RecGRELA achieving state-of-the-art or competitive performance with substantially reduced memory usage, and ablation analyses confirm the importance of gating, RoPE, and local modeling. The approach offers a scalable, accurate alternative to transformer or RNN-based SRS models, with potential for extension to session-based and multi-modal settings.
Abstract
In Sequential Recommendation Systems (SRSs), Transformer models have demonstrated remarkable performance but face computational and memory cost challenges, especially when modeling long-term user behavior sequences. Due to its quadratic complexity, the dot-product attention mechanism in Transformers becomes expensive for processing long sequences. By approximating the dot-product attention using elaborate mapping functions, linear attention provides a more efficient option with linear complexity. However, existing linear attention methods face three limitations: 1) they often use learnable position encodings, which incur extra computational costs in long-term sequence scenarios, 2) they may not sufficiently account for user's fine-grained local preferences (short-lived burst of interest), and 3) they try to capture some temporary activities, but often confuse these with stable and long-term interests. This can result in unclear or less effective recommendations. To remedy these drawbacks, we propose a long-term sequential Recommendation model with Gated Rotary Enhanced Linear Attention (RecGRELA). Specifically, we first propose a Rotary-Enhanced Linear Attention (RELA) module to efficiently model long-range dependency within the user's historical information using rotary position encodings. Then, we introduce a local short operation to add the local preferences of interactions and show the theoretical insight. We further introduce a SiLU-based Gated mechanism for RELA (GRELA) to help the model tell if a user behavior shows a short-term, local interest or a real change in their long-term tastes. Experimental results on four public benchmark datasets show that our RecGRELA achieves state-of-the-art performance compared with existing SRSs based on Recurrent Neural Networks, Transformer, and Mamba while keeping low memory overhead.
