PMLBmini: A Tabular Classification Benchmark Suite for Data-Scarce Applications

Ricardo Knauer; Marvin Grimm; Erik Rodner

PMLBmini: A Tabular Classification Benchmark Suite for Data-Scarce Applications

Ricardo Knauer, Marvin Grimm, Erik Rodner

TL;DR

PMLBmini is introduced, a tabular benchmark suite of 44 binary classification datasets with sample sizes $\leq$ 500 that is thoroughly evaluated to thoroughly evaluate current automated machine learning frameworks, off-the-shelf tabular deep neural networks, as well as classical linear models in the low-data regime.

Abstract

In practice, we are often faced with small-sized tabular data. However, current tabular benchmarks are not geared towards data-scarce applications, making it very difficult to derive meaningful conclusions from empirical comparisons. We introduce PMLBmini, a tabular benchmark suite of 44 binary classification datasets with sample sizes $\leq$ 500. We use our suite to thoroughly evaluate current automated machine learning (AutoML) frameworks, off-the-shelf tabular deep neural networks, as well as classical linear models in the low-data regime. Our analysis reveals that state-of-the-art AutoML and deep learning approaches often fail to appreciably outperform even a simple logistic regression baseline, but we also identify scenarios where AutoML and deep learning methods are indeed reasonable to apply. Our benchmark suite, available on https://github.com/RicardoKnauer/TabMini , allows researchers and practitioners to analyze their own methods and challenge their data efficiency.

PMLBmini: A Tabular Classification Benchmark Suite for Data-Scarce Applications

TL;DR

PMLBmini is introduced, a tabular benchmark suite of 44 binary classification datasets with sample sizes

500 that is thoroughly evaluated to thoroughly evaluate current automated machine learning frameworks, off-the-shelf tabular deep neural networks, as well as classical linear models in the low-data regime.

Abstract

500. We use our suite to thoroughly evaluate current automated machine learning (AutoML) frameworks, off-the-shelf tabular deep neural networks, as well as classical linear models in the low-data regime. Our analysis reveals that state-of-the-art AutoML and deep learning approaches often fail to appreciably outperform even a simple logistic regression baseline, but we also identify scenarios where AutoML and deep learning methods are indeed reasonable to apply. Our benchmark suite, available on https://github.com/RicardoKnauer/TabMini , allows researchers and practitioners to analyze their own methods and challenge their data efficiency.

Paper Structure (12 sections, 3 figures, 1 table)

This paper contains 12 sections, 3 figures, 1 table.

Introduction
Related Work and Desiderata
Benchmark Design
Datasets
Available Methods
Python Interface
Experiments
Experimental Setup
Experimental Results
Meta-Feature Analysis
Broader Impact and Limitations
Conclusion

Figures (3)

Figure 1: Overview of our work on PMLBmini, the first tabular classification benchmark suite specifically for data-scarce applications.
Figure 2: Discriminative performance for AutoML, deep learning, and logistic regression on our benchmark suite PMLBmini.
Figure 3: What dataset meta-features influence model performance? The plot shows the meta-feature groups alcobacca2020mfe that are represented in the top-10 meta-features per approach. To that end, we computed all PyMFE meta-features per dataset, the mean test AUC differences between each AutoML / deep learning method and logistic regression per dataset, the absolute Spearman rank correlation coefficient between each PyMFE meta-feature and the performance difference across datasets (Sect. \ref{['sec:interface']}); and finally selected the top-10 meta-features with the largest absolute correlations.

PMLBmini: A Tabular Classification Benchmark Suite for Data-Scarce Applications

TL;DR

Abstract

PMLBmini: A Tabular Classification Benchmark Suite for Data-Scarce Applications

Authors

TL;DR

Abstract

Table of Contents

Figures (3)