Fine-tuned In-Context Learning Transformers are Excellent Tabular Data Classifiers

Felix den Breejen; Sangmin Bae; Stephen Cha; Se-Young Yun

Fine-tuned In-Context Learning Transformers are Excellent Tabular Data Classifiers

Felix den Breejen, Sangmin Bae, Stephen Cha, Se-Young Yun

TL;DR

This work extends TabPFN by enabling fine-tuning of in-context learning transformers for tabular data, revealing that fine-tuning dramatically boosts performance and enables complex decision boundaries akin to tree-based methods. To maximize this capability, the authors introduce a Forest Dataset Generator and a combined TabForestPFN pretraining strategy that blends TabPFN and forest data, achieving strong fine-tuning performance while maintaining competitive zero-shot results. Empirical results on TabZilla and WhyTrees show that fine-tuned ICL-transformers can rival traditional tree models on challenging datasets, with boundary complexity correlating with improved accuracy. The findings suggest a shift toward leveraging ICL-transformers for tabular data, while also highlighting practical considerations such as GPU memory constraints and the potential for extending the approach to regression and interpretability tasks.

Abstract

The recently introduced TabPFN pretrains an In-Context Learning (ICL) transformer on synthetic data to perform tabular data classification. In this work, we extend TabPFN to the fine-tuning setting, resulting in a significant performance boost. We also discover that fine-tuning enables ICL-transformers to create complex decision boundaries, a property regular neural networks do not have. Based on this observation, we propose to pretrain ICL-transformers on a new forest dataset generator which creates datasets that are unrealistic, but have complex decision boundaries. TabForest, the ICL-transformer pretrained on this dataset generator, shows better fine-tuning performance when pretrained on more complex datasets. Additionally, TabForest outperforms TabPFN on some real-world datasets when fine-tuning, despite having lower zero-shot performance due to the unrealistic nature of the pretraining datasets. By combining both dataset generators, we create TabForestPFN, an ICL-transformer that achieves excellent fine-tuning performance and good zero-shot performance.

Fine-tuned In-Context Learning Transformers are Excellent Tabular Data Classifiers

TL;DR

Abstract

Paper Structure (26 sections, 2 equations, 14 figures, 5 tables, 2 algorithms)

This paper contains 26 sections, 2 equations, 14 figures, 5 tables, 2 algorithms.

Introduction
Related Works
Preliminaries
TabPFN Dataset Generator
Architecture
Methodology
Forest Dataset Generation
Fine-tuning Procedure
Experiments
Introduction of the Benchmark Datasets
Main Results of TabForestPFN
Complexity of ICL-Transformers' Decision Boundaries
Ablation of the Forest Dataset Generator
Case Study of the Gap between Neural Networks and Tree Algorithms.
Improvement of Fine-tuning over Zero-shot
...and 11 more sections

Figures (14)

Figure 1: Comparison of decision boundaries for the Electricity dataset (OpenML ID 44156). Axis represent features, colors are predicted class probabilities, and dots are test observations. Fine-tuned variants show a higher complexity score $V$ (see section \ref{['sec:results_decision_boundaries']}) than zero-shot variants.
Figure 2: Base ICL-transformer architecture. On the left, dataset features and targets are separately encoded into tokens. On the right, the targets of the query dataset are used as label. In the middle is the ICL-transformer with the attention mask.
Figure 3: Generated forest data. Every box is a generated dataset with its own classes (color) and features (axes). The data clouds look unrealistic: decision boundaries are always orthogonal, and there is no feature correlation. Generated with base size of 1024, dataset size of 1024, maximum tree depth between 1 and 25, two features, and between 2 and 10 number of classes.
Figure 4: Main results on the WhyTrees Benchmark. TabForestPFN shows the mean over ten default runs for different fine-tuning seeds, all others use random search over the hyperparameters. See Table \ref{['table:whytreestab']} for other ICL-transformers.
Figure 5: Ablation of the base size and maximum tree depth parameters of the Forest Dataset Generator. Figure shows normalized test accuracy of TabForest on the WhyTrees benchmark.
...and 9 more figures

Fine-tuned In-Context Learning Transformers are Excellent Tabular Data Classifiers

TL;DR

Abstract

Fine-tuned In-Context Learning Transformers are Excellent Tabular Data Classifiers

Authors

TL;DR

Abstract

Table of Contents

Figures (14)