Table of Contents
Fetching ...

TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications

David Salinas, Nick Erickson

TL;DR

TabRepo presents a large-scale repository of precomputed tabular-model evaluations and predictions across 200 datasets and 1310 configurations, enabling off-policy analysis of HPO, ensembling, and transfer learning at marginal cost. By exposing full model predictions, TabRepo makes fast ensemble simulation and portfolio learning feasible, facilitating transfer-learning to achieve state-of-the-art-like performance on tabular data. The work shows that offline portfolio learning can surpass single-method tuning and, in some settings, compete with AutoML systems, with AutoGluon adopting learned portfolios as defaults. Across substantial compute savings and broad applicability, TabRepo promises to accelerate research and practical deployment in tabular AutoML, while acknowledging ethical and scalability considerations. The authors provide open access to code and data to foster reproducibility and community-driven improvement.

Abstract

We introduce TabRepo, a new dataset of tabular model evaluations and predictions. TabRepo contains the predictions and metrics of 1310 models evaluated on 200 classification and regression datasets. We illustrate the benefit of our dataset in multiple ways. First, we show that it allows to perform analysis such as comparing Hyperparameter Optimization against current AutoML systems while also considering ensembling at marginal cost by using precomputed model predictions. Second, we show that our dataset can be readily leveraged to perform transfer-learning. In particular, we show that applying standard transfer-learning techniques allows to outperform current state-of-the-art tabular systems in accuracy, runtime and latency.

TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications

TL;DR

TabRepo presents a large-scale repository of precomputed tabular-model evaluations and predictions across 200 datasets and 1310 configurations, enabling off-policy analysis of HPO, ensembling, and transfer learning at marginal cost. By exposing full model predictions, TabRepo makes fast ensemble simulation and portfolio learning feasible, facilitating transfer-learning to achieve state-of-the-art-like performance on tabular data. The work shows that offline portfolio learning can surpass single-method tuning and, in some settings, compete with AutoML systems, with AutoGluon adopting learned portfolios as defaults. Across substantial compute savings and broad applicability, TabRepo promises to accelerate research and practical deployment in tabular AutoML, while acknowledging ethical and scalability considerations. The authors provide open access to code and data to foster reproducibility and community-driven improvement.

Abstract

We introduce TabRepo, a new dataset of tabular model evaluations and predictions. TabRepo contains the predictions and metrics of 1310 models evaluated on 200 classification and regression datasets. We illustrate the benefit of our dataset in multiple ways. First, we show that it allows to perform analysis such as comparing Hyperparameter Optimization against current AutoML systems while also considering ensembling at marginal cost by using precomputed model predictions. Second, we show that our dataset can be readily leveraged to perform transfer-learning. In particular, we show that applying standard transfer-learning techniques allows to outperform current state-of-the-art tabular systems in accuracy, runtime and latency.
Paper Structure (19 sections, 8 equations, 5 figures, 2 tables)

This paper contains 19 sections, 8 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Cluster map of model rank for all datasets (left), correlation of model ranks (middle) and average runtime distribution over every dataset (right). For readability, only the first 3 configurations of each model family are displayed in the left and middle figures. The performance of models that could not be fitted successfully are represented in black.
  • Figure 2: Normalized error for all model families when using default hyperparameters, tuned hyperparameters, and ensembling after tuning. All methods are run with a 4h budget.
  • Figure 3: Hyperparameter importance for each model family using the fANOVA method from hutter2014. Y-axis is in log-space.
  • Figure 4: Top: scatter plot of average normalized error (left) and rank (right) against fitting training time budget. Bottom: Critical difference (CD) diagram showing average rank between method selected and which methods are tied statistically by a horizontal bar.
  • Figure 5: Impact on normalized error when varying the (a) number of configurations per family, (b) number of training datasets, (c) portfolio size and (d) number of ensemble members.