Tabular Data: Is Deep Learning all you need?

Guri Zabërgja; Arlind Kadra; Christian M. M. Frey; Josif Grabocka

Tabular Data: Is Deep Learning all you need?

Guri Zabërgja, Arlind Kadra, Christian M. M. Frey, Josif Grabocka

TL;DR

The paper conducts a large-scale, fair comparison of 17 tabular-classification methods across 68 OpenML datasets using nested cross-validation and thorough hyperparameter optimization, revealing a paradigm shift where Deep Learning methods prevail over traditional gradient-boosted trees. It highlights the superior performance of meta-learned foundation models (e.g., TabICL, TabPFNv2) in most data regimes, while showing that refitting on combined train+validation data after HPO can further improve predictive quality and alter model rankings. The authors also assess transfer-learning paradigms, demonstrating in-context learning often outperforms fine-tuning, and provide an in-depth analysis of HPO effects and hyperparameter importance. An open-source benchmark with extensive results and a transparent experimental protocol aims to standardize future research and accelerate progress in tabular-data deep learning and AutoML.

Abstract

Tabular data represent one of the most prevalent data formats in applied machine learning, largely because they accommodate a broad spectrum of real-world problems. Existing literature has studied many of the shortcomings of neural architectures on tabular data and has repeatedly confirmed the scalability and robustness of gradient-boosted decision trees across varied datasets. However, recent deep learning models have not been subjected to a comprehensive evaluation under conditions that allow for a fair comparison with existing classical approaches. This situation motivates an investigation into whether recent deep-learning paradigms outperform classical ML methods on tabular data. Our survey fills this gap by benchmarking seventeen state-of-the-art methods, spanning neural networks, classical ML and AutoML techniques. Our empirical results over 68 diverse datasets from a well-established benchmark indicate a paradigm shift, where Deep Learning methods outperform classical approaches.

Tabular Data: Is Deep Learning all you need?

TL;DR

Abstract

Paper Structure (45 sections, 1 equation, 25 figures, 35 tables, 1 algorithm)

This paper contains 45 sections, 1 equation, 25 figures, 35 tables, 1 algorithm.

Introduction
Related Work
Experimental Protocol
Learning with Tabular Data
Experimental Setup
Baselines
Experiments and Results
Research Question 1: Do DL models outperform gradient boosting methods in tabular data classification?
Research Question 2: Do meta-learned NNs outperform data-specific NNs in tabular data classification?
Research Question 3: Which paradigm in transfer learning performs better: Do in-context models or fine-tuned models perform better?
Research Question 4: Does refitting after performing hyperparameter optimization have a significant impact on the predictive quality of the models, and does it impact the overall model ranking?
Research Question 5: What is the influence of hyperparameter optimization on a method's predictive performance?
Conclusion
Evaluation protocol and Configuration Spaces
Evaluation Protocol
...and 30 more sections

Figures (25)

Figure 1: Taxonomy tree of algorithms applied to tabular classification (TC) models
Figure 2: Left: Distribution of ranks for the Deep Learning ($12$ methods), Classical ML ($3$ methods) and AutoML ($1$ method) classifier families. Right: Distribution of ranks for the Foundation Models ($5$ methods), Dataset-Specific ($7$ methods) and AutoML ($1$ method) classifier families. The boxplots illustrate the rank spread, with medians represented by black lines, diamonds representing the means, and whiskers showing the range.
Figure 3: Win-rate dueling matrix comparing learning methods across shared datasets. Each cell (row $i$, column $j$) shows the fraction of common datasets on which method $i$ outperforms method $j$.
Figure 4: Dataset landscape showing winning method families across different dataset sizes. Each point represents a dataset from the OpenMLCC18 benchmark, positioned by number of rows (x-axis) and features (y-axis) on log scales. Colors indicate which method family achieved the highest accuracy: Deep Learning methods (orange), Classical ML tree-based models (green), and ties (gray).
Figure 5: Critical difference (CD) diagram of the methods, where a horizontal bar indicates the absence of statistical significance. Left: CD diagram of Deep Learning vs. GBDTs, Right: CD diagram of dataset-specific vs. foundation models.
...and 20 more figures

Tabular Data: Is Deep Learning all you need?

TL;DR

Abstract

Tabular Data: Is Deep Learning all you need?

Authors

TL;DR

Abstract

Table of Contents

Figures (25)