BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations
Mengyang Ma, Xiaopeng Li, Wanyu Wang, Zhaocheng Du, Jingtong Gao, Pengyue Jia, Yuyang Ye, Yiqi Wang, Yunpeng Weng, Weihong Luo, Xiao Han, Xiangyu Zhao
TL;DR
BlossomRec tackles the quadratic cost of attention in long-sequence sequential recommendation by introducing a block-level fused sparse attention mechanism that models long-term and short-term user interests via LTIS and STIS paths. The two sparse pathways are fused through a learnable gate, enabling adaptive balancing across sequence lengths and improving stability. The approach achieves state-of-the-art or competitive accuracy while dramatically reducing training and inference memory, and it demonstrates strong scalability on long histories. Empirical results across four public datasets, plus ablations and parameter studies, validate its effectiveness and efficiency, with clear guidance on hyperparameter settings for practical deployment.
Abstract
Transformer structures have been widely used in sequential recommender systems (SRS). However, as user interaction histories increase, computational time and memory requirements also grow. This is mainly caused by the standard attention mechanism. Although there exist many methods employing efficient attention and SSM-based models, these approaches struggle to effectively model long sequences and may exhibit unstable performance on short sequences. To address these challenges, we design a sparse attention mechanism, BlossomRec, which models both long-term and short-term user interests through attention computation to achieve stable performance across sequences of varying lengths. Specifically, we categorize user interests in recommendation systems into long-term and short-term interests, and compute them using two distinct sparse attention patterns, with the results combined through a learnable gated output. Theoretically, it significantly reduces the number of interactions participating in attention computation. Extensive experiments on four public datasets demonstrate that BlossomRec, when integrated with state-of-the-art Transformer-based models, achieves comparable or even superior performance while significantly reducing memory usage, providing strong evidence of BlossomRec's efficiency and effectiveness.The code is available at https://github.com/ronineume/BlossomRec.
