UniPredict: Large Language Models are Universal Tabular Classifiers
Ruiyu Wang, Zifeng Wang, Jimeng Sun
TL;DR
UniPredict tackles the rigid target limitation of conventional tabular predictors by proposing a universal tabular modeling paradigm based on large language models. It trains a single GPT-2-based predictor on 169 diverse datasets, with prompt engineering and target augmentation to enable predictions for arbitrary targets, and validates performance against dataset-specific baselines. The framework achieves up to 13.4% relative gains over the best neural baselines and 5.4% over the top tree-boosting methods, while showing strong few-shot adaptability on 62 unseen datasets and robustness in low-resource settings. The work demonstrates the feasibility and value of universal, instruction-tuned tabular prediction at scale, and offers practical insights into metadata quality, context window limits, and feature value cleanliness for deployment.
Abstract
Tabular data prediction is a fundamental machine learning task for many applications. Existing methods predominantly employ discriminative modeling and operate under the assumption of a fixed target column, necessitating re-training for every new predictive task. Inspired by the generative power of large language models (LLMs), this paper exploits the idea of building universal tabular data predictors based on generative modeling, namely UniPredict. Here, we demonstrate the scalability of an LLM to extensive tabular datasets, enabling it to comprehend diverse tabular inputs and predict target variables following the provided instructions. Specifically, we train a single LLM on an aggregation of 169 tabular datasets with diverse targets and compare its performance against baselines that are trained on each dataset separately. We observe this versatile UniPredict model demonstrates an advantage over other models, ranging from 5.4% to 13.4%, when compared with the best tree-boosting baseline and the best neural network baseline, respectively. We further test UniPredict in few-shot learning settings on another 62 tabular datasets. Our method achieves strong performance in quickly adapting to new tasks. In low-resource few-shot setup, we observed a 100%+ performance advantage compared with XGBoost, and significant margin over all baselines. We envision that UniPredict sheds light on developing a universal tabular data prediction system that learns from data at scale and serves a wide range of prediction tasks.
