A Reproducible Analysis of Sequential Recommender Systems
Filippo Betello, Antonio Purificato, Federico Siciliano, Giovanni Trappolini, Andrea Bacciu, Nicola Tonellotto, Fabrizio Silvestri
TL;DR
The paper tackles reproducibility gaps in Sequential Recommender Systems (SRSs) by introducing EasyRec, a standardized framework for data preprocessing and model implementation that enables fair, repeatable evaluations across datasets. It conducts extensive, controlled experiments on multiple benchmarks, re-evaluating classic SRS models (e.g., GRU4Rec, SASRec, BERT4Rec, NARM, CORE) under consistent settings and tracking energy emissions. Key findings include that GRU4Rec can outperform others on MovieLens datasets, while transformer-based models like SASRec excel with larger embedding sizes; longer input sequences generally help attention-based models, though effects are dataset-dependent. The work emphasizes the importance of standardized benchmarks and sustainability considerations, providing an open-source path to robust, comparable SRS research and benchmarking.
Abstract
Sequential Recommender Systems (SRSs) have emerged as a highly efficient approach to recommendation systems. By leveraging sequential data, SRSs can identify temporal patterns in user behaviour, significantly improving recommendation accuracy and relevance.Ensuring the reproducibility of these models is paramount for advancing research and facilitating comparisons between them. Existing works exhibit shortcomings in reproducibility and replicability of results, leading to inconsistent statements across papers. Our work fills these gaps by standardising data pre-processing and model implementations, providing a comprehensive code resource, including a framework for developing SRSs and establishing a foundation for consistent and reproducible experimentation. We conduct extensive experiments on several benchmark datasets, comparing various SRSs implemented in our resource. We challenge prevailing performance benchmarks, offering new insights into the SR domain. For instance, SASRec does not consistently outperform GRU4Rec. On the contrary, when the number of model parameters becomes substantial, SASRec starts to clearly dominate all the other SRSs. This discrepancy underscores the significant impact that experimental configuration has on the outcomes and the importance of setting it up to ensure precise and comprehensive results. Failure to do so can lead to significantly flawed conclusions, highlighting the need for rigorous experimental design and analysis in SRS research. Our code is available at https://github.com/antoniopurificato/recsys_repro_conf.
