The Potential of AutoML for Recommender Systems

Tobias Vente; Joeran Beel

The Potential of AutoML for Recommender Systems

Tobias Vente, Joeran Beel

TL;DR

The paper investigates whether AutoML approaches can help inexperienced users build RecSys by benchmarking 60 algorithms from AutoML, AutoRecSys, ML, and RecSys libraries across 14 explicit-feedback datasets using default hyperparameters. It uses a Docker-based, fixed-resource evaluation with hold-out RMSE to compare methods, revealing that AutoRecSys and AutoML often outperform traditional RecSys and ML approaches, though results vary by dataset and many runs fail or underperform baselines due to resource and configuration limitations. The study highlights AutoML's potential for making RecSys more accessible while identifying gaps in AutoRecSys maturity and benchmarking infrastructure. It advocates for richer, standardized benchmarks and more sophisticated AutoRecSys integrations to realize automated, robust RecSys deployment.

Abstract

Automated Machine Learning (AutoML) has greatly advanced applications of Machine Learning (ML) including model compression, machine translation, and computer vision. Recommender Systems (RecSys) can be seen as an application of ML. Yet, AutoML has found little attention in the RecSys community; nor has RecSys found notable attention in the AutoML community. Only few and relatively simple Automated Recommender Systems (AutoRecSys) libraries exist that adopt AutoML techniques. However, these libraries are based on student projects and do not offer the features and thorough development of AutoML libraries. We set out to determine how AutoML libraries perform in the scenario of an inexperienced user who wants to implement a recommender system. We compared the predictive performance of 60 AutoML, AutoRecSys, ML, and RecSys algorithms from 15 libraries, including a mean predictor baseline, on 14 explicit feedback RecSys datasets. To simulate the perspective of an inexperienced user, the algorithms were evaluated with default hyperparameters. We found that AutoML and AutoRecSys libraries performed best. AutoML libraries performed best for six of the 14 datasets (43%), but it was not always the same AutoML library performing best. The single-best library was the AutoRecSys library Auto-Surprise, which performed best on five datasets (36%). On three datasets (21%), AutoML libraries performed poorly, and RecSys libraries with default parameters performed best. Although, while obtaining 50% of all placements in the top five per dataset, RecSys algorithms fall behind AutoML on average. ML algorithms generally performed the worst.

The Potential of AutoML for Recommender Systems

TL;DR

Abstract

Paper Structure (19 sections, 3 figures, 3 tables)

This paper contains 19 sections, 3 figures, 3 tables.

Introduction
Related Work
Methods
Datasets
Libraries
Experiment Setup
Results
Impacts of Resource Limits
Predictive Performance
Comparison to the Baseline
Discussion
Conclusion
Versions, URLs, and Licenses of all used Libraries and Datasets
Libraries
Datasets
...and 4 more sections

Figures (3)

Figure 1: Top 5 Algorithms per Dataset For each dataset, the top 5 algorithms are shown. They are color coded for their category. Moreover, the RMSE and name of the algorithm are shown. A larger, more readable version of this figure is located in the appendix.
Figure 2: Order of Categories per Dataset For each dataset, the 5 categories are shown sorted by their best performing algorithm's RMSE. They are color coded for their category. Moreover, the RMSE and name of the best performing algorithm are shown. A larger, more readable version of this figure is located in the appendix.
Figure 3: RMSE Boxplots for each category specific to each dataset The distribution of error values for each category specific to each dataset is visualized. The distribution is represented by boxplots. This excludes 61 error values that are greater than 2.

The Potential of AutoML for Recommender Systems

TL;DR

Abstract

The Potential of AutoML for Recommender Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (3)