Table of Contents
Fetching ...

Performance Comparison of Session-based Recommendation Algorithms based on GNNs

Faisal Shehzad, Dietmar Jannach

TL;DR

The paper investigates whether eight recent GNN-based session-based recommendation models truly outperform simpler baselines. By reimplementing and evaluating these models under a uniform protocol across three datasets, the authors find that simple methods often achieve superior $MRR@20$ and comparable or better $HR@20$, exposing weaknesses in prior comparisons. They also reveal that factors such as embedding size, random seeds, and tuning on test data can drastically influence results, calling for stricter experimental standards. The findings suggest that there is considerable room for methodological improvement and scope to exploit richer side information, with future work likely to explore transformer-based, attention-driven SBRS models under robust evaluation frameworks.

Abstract

In session-based recommendation settings, a recommender system has no access to long-term user profiles and thus has to base its suggestions on the user interactions that are observed in an ongoing session. Since such sessions can consist of only a small set of interactions, various approaches based on Graph Neural Networks (GNN) were recently proposed, as they allow us to integrate various types of side information about the items in a natural way. Unfortunately, a variety of evaluation settings are used in the literature, e.g., in terms of protocols, metrics and baselines, making it difficult to assess what represents the state of the art. In this work, we present the results of an evaluation of eight recent GNN-based approaches that were published in high-quality outlets. For a fair comparison, all models are systematically tuned and tested under identical conditions using three common datasets. We furthermore include k-nearest-neighbor and sequential rules-based models as baselines, as such models have previously exhibited competitive performance results for similar settings. To our surprise, the evaluation showed that the simple models outperform all recent GNN models in terms of the Mean Reciprocal Rank, which we used as an optimization criterion, and were only outperformed in three cases in terms of the Hit Rate. Additional analyses furthermore reveal that several other factors that are often not deeply discussed in papers, e.g., random seeds, can markedly impact the performance of GNN-based models. Our results therefore (a) point to continuing issues in the community in terms of research methodology and (b) indicate that there is ample room for improvement in session-based recommendation.

Performance Comparison of Session-based Recommendation Algorithms based on GNNs

TL;DR

The paper investigates whether eight recent GNN-based session-based recommendation models truly outperform simpler baselines. By reimplementing and evaluating these models under a uniform protocol across three datasets, the authors find that simple methods often achieve superior and comparable or better , exposing weaknesses in prior comparisons. They also reveal that factors such as embedding size, random seeds, and tuning on test data can drastically influence results, calling for stricter experimental standards. The findings suggest that there is considerable room for methodological improvement and scope to exploit richer side information, with future work likely to explore transformer-based, attention-driven SBRS models under robust evaluation frameworks.

Abstract

In session-based recommendation settings, a recommender system has no access to long-term user profiles and thus has to base its suggestions on the user interactions that are observed in an ongoing session. Since such sessions can consist of only a small set of interactions, various approaches based on Graph Neural Networks (GNN) were recently proposed, as they allow us to integrate various types of side information about the items in a natural way. Unfortunately, a variety of evaluation settings are used in the literature, e.g., in terms of protocols, metrics and baselines, making it difficult to assess what represents the state of the art. In this work, we present the results of an evaluation of eight recent GNN-based approaches that were published in high-quality outlets. For a fair comparison, all models are systematically tuned and tested under identical conditions using three common datasets. We furthermore include k-nearest-neighbor and sequential rules-based models as baselines, as such models have previously exhibited competitive performance results for similar settings. To our surprise, the evaluation showed that the simple models outperform all recent GNN models in terms of the Mean Reciprocal Rank, which we used as an optimization criterion, and were only outperformed in three cases in terms of the Hit Rate. Additional analyses furthermore reveal that several other factors that are often not deeply discussed in papers, e.g., random seeds, can markedly impact the performance of GNN-based models. Our results therefore (a) point to continuing issues in the community in terms of research methodology and (b) indicate that there is ample room for improvement in session-based recommendation.
Paper Structure (9 sections, 1 figure, 6 tables)

This paper contains 9 sections, 1 figure, 6 tables.

Figures (1)

  • Figure 1: Distribution of MRR@20 values for different random seeds (RSC15)