Evaluating the performance-deviation of itemKNN in RecBole and LensKit

Michael Schmidt; Jannik Nitschke; Tim Prinz

Evaluating the performance-deviation of itemKNN in RecBole and LensKit

Michael Schmidt, Jannik Nitschke, Tim Prinz

TL;DR

This work compares ItemKNN implementations in RecBole and LensKit to quantify how implementation details affect Top-N ranking quality. The authors systematically align evaluation metrics and dissect the underlying similarity-matrix computations using four datasets and $k=20$, evaluating with $nDCG@10$, $Precision@10$, and $Recall@10$. They find that divergences in $nDCG$ largely stem from different ways of computing the similarity matrix and the IDCG term; after standardizing $nDCG$ and adopting a top-$k$ neighbor approach, the two libraries yield nearly identical $nDCG@10$ on all datasets. The work highlights the practical importance of consistent metric definitions and similarity computations in recommender-system research and suggests that RecBole’s top-$k$ pruning can reduce noise and improve stability, while LensKit can be adjusted to achieve comparable performance. The study provides actionable guidance for practitioners selecting a library and for developers implementing ItemKNN, underlining the sensitivity of evaluation to implementation details.

Abstract

This study examines the performance of item-based k-Nearest Neighbors (ItemKNN) algorithms in the RecBole and LensKit recommender system libraries. Using four data sets (Anime, Modcloth, ML-100K, and ML-1M), we assess each library's efficiency, accuracy, and scalability, focusing primarily on normalized discounted cumulative gain (nDCG). Our results show that RecBole outperforms LensKit on two of three metrics on the ML-100K data set: it achieved an 18% higher nDCG, 14% higher precision, and 35% lower recall. To ensure a fair comparison, we adjusted LensKit's nDCG calculation to match RecBole's method. This alignment made the performance more comparable, with LensKit achieving an nDCG of 0.2540 and RecBole 0.2674. Differences in similarity matrix calculations were identified as the main cause of performance deviations. After modifying LensKit to retain only the top K similar items, both libraries showed nearly identical nDCG values across all data sets. For instance, both achieved an nDCG of 0.2586 on the ML-1M data set with the same random seed. Initially, LensKit's original implementation only surpassed RecBole in the ModCloth dataset.

Evaluating the performance-deviation of itemKNN in RecBole and LensKit

TL;DR

, evaluating with

, and

. They find that divergences in

largely stem from different ways of computing the similarity matrix and the IDCG term; after standardizing

and adopting a top-

neighbor approach, the two libraries yield nearly identical

on all datasets. The work highlights the practical importance of consistent metric definitions and similarity computations in recommender-system research and suggests that RecBole’s top-

pruning can reduce noise and improve stability, while LensKit can be adjusted to achieve comparable performance. The study provides actionable guidance for practitioners selecting a library and for developers implementing ItemKNN, underlining the sensitivity of evaluation to implementation details.

Abstract

Paper Structure (17 sections, 3 equations, 4 figures, 4 tables)

This paper contains 17 sections, 3 equations, 4 figures, 4 tables.

Introduction
Library introduction
Method
Data Sets
Algorithms
Pre-processing and Data Splitting
Algorithm Training and Evaluation
Hardware Specifications
Results
First Steps
Further Investigations
Adjustment of LensKit nDCG Calculation
Implementation Difference
Possible advantages of the RecBole-Implementation
Adjustment of LensKit ItemKNN
...and 2 more sections

Figures (4)

Figure 1: Comparison of nDCG@10 for ML-100K dataset
Figure 2: Comparison of nDCG@10 for ML-1M dataset
Figure 3: Comparison of nDCG@10 for Anime dataset
Figure 4: Comparison of nDCG@10 for Modcloth dataset

Evaluating the performance-deviation of itemKNN in RecBole and LensKit

TL;DR

Abstract

Evaluating the performance-deviation of itemKNN in RecBole and LensKit

Authors

TL;DR

Abstract

Table of Contents

Figures (4)