Table of Contents
Fetching ...

XTab: Cross-table Pretraining for Tabular Transformers

Bingzhao Zhu, Xingjian Shi, Nick Erickson, Mu Li, George Karypis, Mahsa Shoaran

TL;DR

XTab enables cross-table pretraining for tabular transformers by separating data-specific featurizers and projection heads from a shared backbone and training across many tables with federated learning. It validates across 84 AMLB tasks, showing that pretrained backbones consistently outperform randomly initialized baselines and that reconstruction objectives often yield the best downstream results. The framework supports multiple backbones and objectives, significantly boosting generalization and learning efficiency for downstream tabular tasks. This approach offers a scalable path toward cross-domain tabular pretraining and provides a foundation for further integration with tree-based methods and multimodal learning.

Abstract

The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data. However, most existing tabular self-supervised learning models fail to leverage information across multiple data tables and cannot generalize to new tables. In this work, we introduce XTab, a framework for cross-table pretraining of tabular transformers on datasets from various domains. We address the challenge of inconsistent column types and quantities among tables by utilizing independent featurizers and using federated learning to pretrain the shared component. Tested on 84 tabular prediction tasks from the OpenML-AutoML Benchmark (AMLB), we show that (1) XTab consistently boosts the generalizability, learning speed, and performance of multiple tabular transformers, (2) by pretraining FT-Transformer via XTab, we achieve superior performance than other state-of-the-art tabular deep learning models on various tasks such as regression, binary, and multiclass classification.

XTab: Cross-table Pretraining for Tabular Transformers

TL;DR

XTab enables cross-table pretraining for tabular transformers by separating data-specific featurizers and projection heads from a shared backbone and training across many tables with federated learning. It validates across 84 AMLB tasks, showing that pretrained backbones consistently outperform randomly initialized baselines and that reconstruction objectives often yield the best downstream results. The framework supports multiple backbones and objectives, significantly boosting generalization and learning efficiency for downstream tabular tasks. This approach offers a scalable path toward cross-domain tabular pretraining and provides a foundation for further integration with tree-based methods and multimodal learning.

Abstract

The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data. However, most existing tabular self-supervised learning models fail to leverage information across multiple data tables and cannot generalize to new tables. In this work, we introduce XTab, a framework for cross-table pretraining of tabular transformers on datasets from various domains. We address the challenge of inconsistent column types and quantities among tables by utilizing independent featurizers and using federated learning to pretrain the shared component. Tested on 84 tabular prediction tasks from the OpenML-AutoML Benchmark (AMLB), we show that (1) XTab consistently boosts the generalizability, learning speed, and performance of multiple tabular transformers, (2) by pretraining FT-Transformer via XTab, we achieve superior performance than other state-of-the-art tabular deep learning models on various tasks such as regression, binary, and multiclass classification.
Paper Structure (33 sections, 2 equations, 11 figures, 16 tables)

This paper contains 33 sections, 2 equations, 11 figures, 16 tables.

Figures (11)

  • Figure 1: The model structure of XTab. XTab is pretrained on multiple tabular tasks (Tab. #1, #2, #3). Samples from different tables are featurized and fed into a transformer model with N blocks. The output of the transformer is further processed by projection heads to derive the pretraining losses. Featurizers and projection heads are data-specific since tables may have different input/output dimensions. The transformer backbone is shared across all pretraining tables to capture the general knowledge.
  • Figure 2: Tabular prediction performance of XTab using various evaluation criteria under the light finetuning setting. (a) The win rate of the pretrained transformer with respect to baseline. (b) The average rank of the models. (c) The normalized prediction performance. (d) The average error reduction rate compared to baseline. Each dot indicates a trial of the downstream task (5 trials per dataset). The error bars show standard deviations in (b) and (c). As the backbone is pretrained for more steps, we observe an increase in all evaluation criteria.
  • Figure 3: Comparison of different pretraining objectives under the light (a, c) and heavy (b, d) finetuning settings. We show the win rate of XTab with different objectives with (a) light and (b) heavy finetuning settings. We also compared the performance of pretraining objectives in terms of the model rank with (c) light and (d) heavy finetuning. We observe a consistent improvement of XTab compared to baseline models with all objectives. The reconstruction pretraining objective achieves the best performance, with 71.0% win rate under light finetuning and 56.1% for heavy finetuning at 2000 pretraining steps.
  • Figure 4: XTab with transformer variants including FT-Transformer, Fastformer, and Saint-v. We use different transformer models as the shared backbone in XTab. We calculate the win rate of the pretrained backbone over randomly initialized transformers. (a) shows the results for light finetuning and (b) represents heavy finetuning. FT-Transformer, Fastformer, and Saint-v all benefit from our proposed cross-table pretraining, achieving $>$50% win rate in all experiments.
  • Figure 5: The figure is similar to Figure \ref{['performance']} in the main paper, but contains more pretraining/finetuning configurations. See the caption and explanation there for more details.
  • ...and 6 more figures