Table of Contents
Fetching ...

Cross-Table Pretraining towards a Universal Function Space for Heterogeneous Tabular Data

Jintai Chen, Zhen Lin, Qiyuan Chen, Jimeng Sun

TL;DR

The paper tackles the challenge of transferring knowledge across heterogeneous tabular datasets by learning a universal meta-function space. It introduces CaLinear to create a calibratable linear basis and integrates it into XTFormer, a Transformer that expands this space into powerful non-linear mappings. Through a three-stage pipeline—cross-table pretraining, task calibration, and refinement—the approach achieves strong cross-table transfer, outperforming dominant GBDTs and deep tabular baselines across 190 downstream tasks, especially under data-scarce conditions. The key contribution is a scalable, data-efficient framework that enables rapid adaptation to new tabular prediction tasks while leveraging broad upstream knowledge.

Abstract

Tabular data from different tables exhibit significant diversity due to varied definitions and types of features, as well as complex inter-feature and feature-target relationships. Cross-dataset pretraining, which learns reusable patterns from upstream data to support downstream tasks, have shown notable success in various fields. Yet, when applied to tabular data prediction, this paradigm faces challenges due to the limited reusable patterns among diverse tabular datasets (tables) and the general scarcity of tabular data available for fine-tuning. In this study, we fill this gap by introducing a cross-table pretrained Transformer, XTFormer, for versatile downstream tabular prediction tasks. Our methodology insight is pretraining XTFormer to establish a "meta-function" space that encompasses all potential feature-target mappings. In pre-training, a variety of potential mappings are extracted from pre-training tabular datasets and are embedded into the "meta-function" space, and suited mappings are extracted from the "meta-function" space for downstream tasks by a specified coordinate positioning approach. Experiments show that, in 190 downstream tabular prediction tasks, our cross-table pretrained XTFormer wins both XGBoost and Catboost on 137 (72%) tasks, and surpasses representative deep learning models FT-Transformer and the tabular pre-training approach XTab on 144 (76%) and 162 (85%) tasks.

Cross-Table Pretraining towards a Universal Function Space for Heterogeneous Tabular Data

TL;DR

The paper tackles the challenge of transferring knowledge across heterogeneous tabular datasets by learning a universal meta-function space. It introduces CaLinear to create a calibratable linear basis and integrates it into XTFormer, a Transformer that expands this space into powerful non-linear mappings. Through a three-stage pipeline—cross-table pretraining, task calibration, and refinement—the approach achieves strong cross-table transfer, outperforming dominant GBDTs and deep tabular baselines across 190 downstream tasks, especially under data-scarce conditions. The key contribution is a scalable, data-efficient framework that enables rapid adaptation to new tabular prediction tasks while leveraging broad upstream knowledge.

Abstract

Tabular data from different tables exhibit significant diversity due to varied definitions and types of features, as well as complex inter-feature and feature-target relationships. Cross-dataset pretraining, which learns reusable patterns from upstream data to support downstream tasks, have shown notable success in various fields. Yet, when applied to tabular data prediction, this paradigm faces challenges due to the limited reusable patterns among diverse tabular datasets (tables) and the general scarcity of tabular data available for fine-tuning. In this study, we fill this gap by introducing a cross-table pretrained Transformer, XTFormer, for versatile downstream tabular prediction tasks. Our methodology insight is pretraining XTFormer to establish a "meta-function" space that encompasses all potential feature-target mappings. In pre-training, a variety of potential mappings are extracted from pre-training tabular datasets and are embedded into the "meta-function" space, and suited mappings are extracted from the "meta-function" space for downstream tasks by a specified coordinate positioning approach. Experiments show that, in 190 downstream tabular prediction tasks, our cross-table pretrained XTFormer wins both XGBoost and Catboost on 137 (72%) tasks, and surpasses representative deep learning models FT-Transformer and the tabular pre-training approach XTab on 144 (76%) and 162 (85%) tasks.
Paper Structure (30 sections, 4 equations, 7 figures, 12 tables)

This paper contains 30 sections, 4 equations, 7 figures, 12 tables.

Figures (7)

  • Figure 1: Illustrating Linear with a set of basis linear layers (Linear) and a calibration module ($\texttt{M}_{cal}$). An element $v_n$ of $\mathbf{v} \in \mathbb{R}^N$ is input to $\texttt{M}_{cal}$ to yield the coefficients $[c^{(n)}_1, ..., c^{(n)}_M]$ for the feature embedding $\mathbf{z}_n$. Only $\mathbf{v}$ is tuned for a new dataset.
  • Figure 2: An illustration of XTFormer framework. In the task calibration fine-tuning phase, the self-attention models and basis functions of the CaLinear layers within the transformer blocks remain frozen; conversely, the dataset-specific components, learnable context vector $\mathbf{c}$ and the normalization layers undergo the task calibration fine-tuning step.
  • Figure 3: An illustration of task calibration and refinement. The task calibration (left) operates within the established function space to search for a dataset-suited model, while the refinement (right) optimizes all parameters for an improved model and may extend beyond the established space.
  • Figure 4: One-on-one comparison to assess the performance of XGBoost, Catboost and XTFormer against representative deep learning approaches. XGBoost and CatBoost outperform or match existing deep learning approaches on a majority of datasets. However, our XTFormer beats both XGboost and Catboost, revolutionizing the landscape of deep learning and GBDTs competition.
  • Figure 5: The win rate of the pre-trained XTFormer in comparison to the baseline, across varying numbers of pretraining epochs.
  • ...and 2 more figures