Revisiting Pretraining Objectives for Tabular Deep Learning

Ivan Rubachev; Artem Alekberov; Yury Gorishniy; Artem Babenko

Revisiting Pretraining Objectives for Tabular Deep Learning

Ivan Rubachev, Artem Alekberov, Yury Gorishniy, Artem Babenko

TL;DR

This work systematically evaluates pretraining strategies for tabular deep learning in fully supervised settings, across eleven diverse datasets and multiple backbone architectures. It reveals that simple self-prediction objectives often rival or outperform contrastive approaches, and that leveraging target labels during pretraining (target-aware objectives) yields additional gains. Target-aware methods, especially when combined with numerical feature embeddings, can push tabular deep models beyond the performance of strong GBDT baselines, with ensembling and careful finetuning on clean data further boosting results. The findings provide practical recipes for practitioners, highlighting when pretraining helps most, the value of target information, and the compute considerations for deploying tabular pretrained models in real-world tasks.

Abstract

Recent deep learning models for tabular data currently compete with the traditional ML models based on decision trees (GBDT). Unlike GBDT, deep models can additionally benefit from pretraining, which is a workhorse of DL for vision and NLP. For tabular problems, several pretraining methods were proposed, but it is not entirely clear if pretraining provides consistent noticeable improvements and what method should be used, since the methods are often not compared to each other or comparison is limited to the simplest MLP architectures. In this work, we aim to identify the best practices to pretrain tabular DL models that can be universally applied to different datasets and architectures. Among our findings, we show that using the object target labels during the pretraining stage is beneficial for the downstream performance and advocate several target-aware pretraining objectives. Overall, our experiments demonstrate that properly performed pretraining significantly increases the performance of tabular DL models, which often leads to their superiority over GBDTs.

Revisiting Pretraining Objectives for Tabular Deep Learning

TL;DR

Abstract

Paper Structure (23 sections, 1 figure, 14 tables)

This paper contains 23 sections, 1 figure, 14 tables.

Introduction
Related Work
Revisiting pretraining objectives
Experimental setup
Comparing pretraining objectives
Target-aware pretraining objectives
Comparing target-aware objectives
Comparison to GBDT
Analysis
Investigating the properties of pretrained models
Efficient ensembling
On importance of finetuning on clean data
Does pretraining require more compute?
Conclusion
Datasets
...and 8 more sections

Figures (1)

Figure 1: The decodability of object feature from the intermediate representations computed by the pretrained models and the models trained from scratch. The pretrained models decently capture the information about all the features, while the randomly initialized models capture the most informative features and suppress the others.

Revisiting Pretraining Objectives for Tabular Deep Learning

TL;DR

Abstract

Revisiting Pretraining Objectives for Tabular Deep Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (1)