Table of Contents
Fetching ...

iLTM: Integrated Large Tabular Model

David Bonet, Marçal Comajoan Cara, Alvaro Calafell, Daniel Mas Montserrat, Alexander G. Ioannidis

TL;DR

iLTM addresses the persistent gap in tabular learning between traditional gradient-boosted decision trees and modern foundation-model scales by integrating GBDT embeddings, a meta-trained hypernetwork, retrieval augmentation, and MLP backbones into a single architecture. Pretrained on about 1,806 real-world classification datasets, iLTM delivers strong performance across classification and regression tasks and can transfer from classification to regression with minimal fine-tuning, thanks to its dataset-focused weight generation. Key contributions include the neural-tree hybrid design, large-scale meta-training, robust cross-task transfer, and an open-source implementation, collectively enabling robust, adaptable, and scalable tabular learning with reduced task-specific tuning. This approach bridges tree-based inductive biases and neural flexibility, offering a practical path toward tabular foundation models that generalize across diverse datasets and scales, including high-dimensional biomedical data and industry-grade benchmarks like TabReD.

Abstract

Tabular data underpins decisions across science, industry, and public services. Despite rapid progress, advances in deep learning have not fully carried over to the tabular domain, where gradient-boosted decision trees (GBDTs) remain a default choice in practice. We present iLTM, an integrated Large Tabular Model that unifies tree-derived embeddings, dimensionality-agnostic representations, a meta-trained hypernetwork, multilayer perceptrons (MLPs), and retrieval within a single architecture. Pretrained on more than 1,800 heterogeneous classification datasets, iLTM achieves consistently superior performance across tabular classification and regression tasks, from small datasets to large and high-dimensional tasks. After light fine-tuning, the meta-trained hypernetwork transfers to regression targets, matching or surpassing strong baselines. Extensive experiments show that iLTM outperforms well-tuned GBDTs and leading deep tabular models while requiring less task-specific tuning. By bridging the gap between tree-based and neural methods, iLTM offers a new framework for tabular foundation models for robust, adaptable, and scalable tabular learning.

iLTM: Integrated Large Tabular Model

TL;DR

iLTM addresses the persistent gap in tabular learning between traditional gradient-boosted decision trees and modern foundation-model scales by integrating GBDT embeddings, a meta-trained hypernetwork, retrieval augmentation, and MLP backbones into a single architecture. Pretrained on about 1,806 real-world classification datasets, iLTM delivers strong performance across classification and regression tasks and can transfer from classification to regression with minimal fine-tuning, thanks to its dataset-focused weight generation. Key contributions include the neural-tree hybrid design, large-scale meta-training, robust cross-task transfer, and an open-source implementation, collectively enabling robust, adaptable, and scalable tabular learning with reduced task-specific tuning. This approach bridges tree-based inductive biases and neural flexibility, offering a practical path toward tabular foundation models that generalize across diverse datasets and scales, including high-dimensional biomedical data and industry-grade benchmarks like TabReD.

Abstract

Tabular data underpins decisions across science, industry, and public services. Despite rapid progress, advances in deep learning have not fully carried over to the tabular domain, where gradient-boosted decision trees (GBDTs) remain a default choice in practice. We present iLTM, an integrated Large Tabular Model that unifies tree-derived embeddings, dimensionality-agnostic representations, a meta-trained hypernetwork, multilayer perceptrons (MLPs), and retrieval within a single architecture. Pretrained on more than 1,800 heterogeneous classification datasets, iLTM achieves consistently superior performance across tabular classification and regression tasks, from small datasets to large and high-dimensional tasks. After light fine-tuning, the meta-trained hypernetwork transfers to regression targets, matching or surpassing strong baselines. Extensive experiments show that iLTM outperforms well-tuned GBDTs and leading deep tabular models while requiring less task-specific tuning. By bridging the gap between tree-based and neural methods, iLTM offers a new framework for tabular foundation models for robust, adaptable, and scalable tabular learning.

Paper Structure

This paper contains 68 sections, 18 equations, 7 figures, 23 tables, 2 algorithms.

Figures (7)

  • Figure 1: Overview of the iLTM architecture. Raw tabular datasets pass through GBDT embeddings and/or robust preprocessing, then a dimensionality-agnostic representation feeds a meta-trained hypernetwork that generates the MLP layers (3 layers, 512 hidden units) of the main network. The main network predictions are augmented with retrieval weighted by $\alpha$.
  • Figure 2: GBDT embedding process. (a) Decision tree splits on $d_1$ and $d_2$ leading to leaves $\{l_1, \ldots, l_5\}$. (b) 2D visualization of datapoints assigned to leaf cells. (c) Each data point is transformed into a one-hot vector. (d) Concatenated one-hot encodings across trees in the ensemble obtaining a sparse binary matrix $\Gamma(\boldsymbol{X})$.
  • Figure 3: Critical difference diagram demvsar2006statistical of average AUC rankings, showing algorithm groups determined by the Conover post-hoc test conover1979multiple following a Friedman test friedman1937use at significance level 0.05. Algorithms connected by a horizontal bar show no statistically significant difference in performance. For algorithms with missing runs, their average rank was imputed using their performance on completed datasets, while those with more than ten missing datasets were removed. Note that iLTM and TabPFNv2 (among others) ran on all datasets.
  • Figure 4: Average rank ($\downarrow$) on the 18 public regression datasets from grinsztajn2022tree also used in gorishniy2024tabrgorishniy2024tabm. The meta-trained iLTM, fine-tuned on each task, achieves the top overall rank, outperforming gradient-boosted trees and performing on par with recent deep tabular models, illustrating that features learned during classification pretraining transfer effectively to regression problems.
  • Figure 5: Left: t-SNE of concatenated hypernetwork embeddings across all three layers in embedding space, with blue and red points represent Tabzilla Hard and high dimensional biomedical datasets, respectively. Center: PCA of weights from 8 ensemble predictors trained on each dataset, showing the first two layers separately in weight space. Right: t-SNE of the weights from the 8 predictors, with weights concatenated across all three layers in weight space, colored by normalized AUC.
  • ...and 2 more figures