A Systematic Reproducibility Study of BSARec for Sequential Recommendation
Jan Hutter, Hua Chang Bakker, Stan Fris, Madelon Bernardy, Yuanna Liu
TL;DR
This study systematically reproduces BSARec, a frequency-aware extension of Transformer-based SR, and extends the analysis to user-history frequency, DSP alternatives, and padding strategies. By introducing a scaled DC component metric, it shows BSARec advantages on high-frequency user groups and reveals that padding strategies can substantially influence performance, while DSP substitutions offer limited gains. Across datasets, results are dataset-dependent, with notable improvements on LastFM and ML-1M and mixed outcomes on Yelp, highlighting the practical importance of inductive bias and data characteristics. Overall, the work underscores the significance of frequency-aware processing and padding choices in SR and calls for broader DSP exploration and more robust reproducibility practices.
Abstract
In sequential recommendation (SR), the self-attention mechanism of Transformer-based models acts as a low-pass filter, limiting their ability to capture high-frequency signals that reflect short-term user interests. To overcome this, BSARec augments the Transformer encoder with a frequency layer that rescales high-frequency components using the Fourier transform. However, the overall effectiveness of BSARec and the roles of its individual components have yet to be systematically validated. We reproduce BSARec and show that it outperforms other SR methods on some datasets. To empirically assess whether BSARec improves performance on high-frequency signals, we propose a metric to quantify user history frequency and evaluate SR methods across different user groups. We compare digital signal processing (DSP) techniques and find that the discrete wavelet transform (DWT) offer only slight improvements over Fourier transforms, and DSP methods provide no clear advantage over simple residual connections. Finally, we explore padding strategies and find that non-constant padding significantly improves recommendation performance, whereas constant padding hinders the frequency rescaler's ability to capture high-frequency signals.
