iLTM: Integrated Large Tabular Model
David Bonet, Marçal Comajoan Cara, Alvaro Calafell, Daniel Mas Montserrat, Alexander G. Ioannidis
TL;DR
iLTM addresses the persistent gap in tabular learning between traditional gradient-boosted decision trees and modern foundation-model scales by integrating GBDT embeddings, a meta-trained hypernetwork, retrieval augmentation, and MLP backbones into a single architecture. Pretrained on about 1,806 real-world classification datasets, iLTM delivers strong performance across classification and regression tasks and can transfer from classification to regression with minimal fine-tuning, thanks to its dataset-focused weight generation. Key contributions include the neural-tree hybrid design, large-scale meta-training, robust cross-task transfer, and an open-source implementation, collectively enabling robust, adaptable, and scalable tabular learning with reduced task-specific tuning. This approach bridges tree-based inductive biases and neural flexibility, offering a practical path toward tabular foundation models that generalize across diverse datasets and scales, including high-dimensional biomedical data and industry-grade benchmarks like TabReD.
Abstract
Tabular data underpins decisions across science, industry, and public services. Despite rapid progress, advances in deep learning have not fully carried over to the tabular domain, where gradient-boosted decision trees (GBDTs) remain a default choice in practice. We present iLTM, an integrated Large Tabular Model that unifies tree-derived embeddings, dimensionality-agnostic representations, a meta-trained hypernetwork, multilayer perceptrons (MLPs), and retrieval within a single architecture. Pretrained on more than 1,800 heterogeneous classification datasets, iLTM achieves consistently superior performance across tabular classification and regression tasks, from small datasets to large and high-dimensional tasks. After light fine-tuning, the meta-trained hypernetwork transfers to regression targets, matching or surpassing strong baselines. Extensive experiments show that iLTM outperforms well-tuned GBDTs and leading deep tabular models while requiring less task-specific tuning. By bridging the gap between tree-based and neural methods, iLTM offers a new framework for tabular foundation models for robust, adaptable, and scalable tabular learning.
