ExcelFormer: A neural network surpassing GBDTs on tabular data
Jintai Chen, Jiahuan Yan, Qiyuan Chen, Danny Ziyi Chen, Jian Wu, Jimeng Sun
TL;DR
ExcelFormer targets the persistent challenges of tabular data prediction by combining a semi-permeable attention mechanism with an interaction-attenuated initialization and a GLU-based attentive FFN, complemented by two tabular-specific data augmentations, Feat-Mix and Hid-Mix. This design disrupts rotational invariance and improves data efficiency while maintaining a model size comparable to existing tabular Transformers. Across 96 small and 21 large real-world datasets, ExcelFormer consistently outperforms GBDTs and prior DNNs using default parameters and with limited tuning, demonstrating both strong accuracy and practical usability. The work shows that careful architectural choices and augmentation strategies can yield a robust, user-friendly “sure bet” solution for diverse tabular prediction tasks, reducing the need for extensive hyperparameter search and model selection by casual users.
Abstract
Data organized in tabular format is ubiquitous in real-world applications, and users often craft tables with biased feature definitions and flexibly set prediction targets of their interests. Thus, a rapid development of a robust, effective, dataset-versatile, user-friendly tabular prediction approach is highly desired. While Gradient Boosting Decision Trees (GBDTs) and existing deep neural networks (DNNs) have been extensively utilized by professional users, they present several challenges for casual users, particularly: (i) the dilemma of model selection due to their different dataset preferences, and (ii) the need for heavy hyperparameter searching, failing which their performances are deemed inadequate. In this paper, we delve into this question: Can we develop a deep learning model that serves as a "sure bet" solution for a wide range of tabular prediction tasks, while also being user-friendly for casual users? We delve into three key drawbacks of deep tabular models, encompassing: (P1) lack of rotational variance property, (P2) large data demand, and (P3) over-smooth solution. We propose ExcelFormer, addressing these challenges through a semi-permeable attention module that effectively constrains the influence of less informative features to break the DNNs' rotational invariance property (for P1), data augmentation approaches tailored for tabular data (for P2), and attentive feedforward network to boost the model fitting capability (for P3). These designs collectively make ExcelFormer a "sure bet" solution for diverse tabular datasets. Extensive and stratified experiments conducted on real-world datasets demonstrate that our model outperforms previous approaches across diverse tabular data prediction tasks, and this framework can be friendly to casual users, offering ease of use without the heavy hyperparameter tuning.
