Tabular Data: Is Deep Learning all you need?
Guri Zabërgja, Arlind Kadra, Christian M. M. Frey, Josif Grabocka
TL;DR
The paper conducts a large-scale, fair comparison of 17 tabular-classification methods across 68 OpenML datasets using nested cross-validation and thorough hyperparameter optimization, revealing a paradigm shift where Deep Learning methods prevail over traditional gradient-boosted trees. It highlights the superior performance of meta-learned foundation models (e.g., TabICL, TabPFNv2) in most data regimes, while showing that refitting on combined train+validation data after HPO can further improve predictive quality and alter model rankings. The authors also assess transfer-learning paradigms, demonstrating in-context learning often outperforms fine-tuning, and provide an in-depth analysis of HPO effects and hyperparameter importance. An open-source benchmark with extensive results and a transparent experimental protocol aims to standardize future research and accelerate progress in tabular-data deep learning and AutoML.
Abstract
Tabular data represent one of the most prevalent data formats in applied machine learning, largely because they accommodate a broad spectrum of real-world problems. Existing literature has studied many of the shortcomings of neural architectures on tabular data and has repeatedly confirmed the scalability and robustness of gradient-boosted decision trees across varied datasets. However, recent deep learning models have not been subjected to a comprehensive evaluation under conditions that allow for a fair comparison with existing classical approaches. This situation motivates an investigation into whether recent deep-learning paradigms outperform classical ML methods on tabular data. Our survey fills this gap by benchmarking seventeen state-of-the-art methods, spanning neural networks, classical ML and AutoML techniques. Our empirical results over 68 diverse datasets from a well-established benchmark indicate a paradigm shift, where Deep Learning methods outperform classical approaches.
