e-Fold Cross-Validation for Recommender-System Evaluation

Moritz Baumgart; Lukas Wegmeth; Tobias Vente; Joeran Beel

e-Fold Cross-Validation for Recommender-System Evaluation

Moritz Baumgart, Lukas Wegmeth, Tobias Vente, Joeran Beel

TL;DR

This paper tackles the high energy cost of evaluating recommender systems with traditional $k$-fold cross-validation. It introduces e-fold cross-validation (e-CV) that adaptively stops folding using a confidence-interval width criterion, reducing energy while preserving robustness, via the rule $|c_{n-1}-c_n| \le \frac{\alpha}{c_n}$. The authors validate e-CV on 5 algorithms across 6 datasets, comparing to the 10-CV baseline and reporting an average difference of $1.81\%$ with energy usage of $41.5\%$ of 10-CV and stopping after $4.15$ folds. The results show varying effectiveness across datasets and algorithms, with ranking stability suggesting practicality and signaling avenues for refinement.

Abstract

To combat the rising energy consumption of recommender systems we implement a novel alternative for k-fold cross validation. This alternative, named e-fold cross validation, aims to minimize the number of folds to achieve a reduction in power usage while keeping the reliability and robustness of the test results high. We tested our method on 5 recommender system algorithms across 6 datasets and compared it with 10-fold cross validation. On average e-fold cross validation only needed 41.5% of the energy that 10-fold cross validation would need, while it's results only differed by 1.81%. We conclude that e-fold cross validation is a promising approach that has the potential to be an energy efficient but still reliable alternative to k-fold cross validation.

e-Fold Cross-Validation for Recommender-System Evaluation

TL;DR

This paper tackles the high energy cost of evaluating recommender systems with traditional

-fold cross-validation. It introduces e-fold cross-validation (e-CV) that adaptively stops folding using a confidence-interval width criterion, reducing energy while preserving robustness, via the rule

. The authors validate e-CV on 5 algorithms across 6 datasets, comparing to the 10-CV baseline and reporting an average difference of

with energy usage of

of 10-CV and stopping after

folds. The results show varying effectiveness across datasets and algorithms, with ranking stability suggesting practicality and signaling avenues for refinement.

e-Fold Cross-Validation for Recommender-System Evaluation

TL;DR

Abstract

e-Fold Cross-Validation for Recommender-System Evaluation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)