Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data
Rishabh Ranjan, Valter Hudovernik, Mark Znidar, Charilaos Kanatsoulis, Roshan Upendra, Mahmoud Mohammadi, Joe Meyer, Tom Palczewski, Carlos Guestrin, Jure Leskovec
TL;DR
The Relational Transformer tackles the lack of foundation-model capabilities for relational databases by introducing cell-level tokenization, task-table integration, and Relational Attention that explicitly encodes column, row, and foreign-key structure. Pretraining on RelBench enables strong zero-shot transfer to unseen datasets and tasks, with RT achieving about 93% of fully supervised AUROC on binary classification using 22M parameters and exhibiting superior data efficiency during fine-tuning. The approach outperforms larger LLM baselines under the same input regime and demonstrates robust schema-agnostic generalization across diverse relational schemas. This work provides a practical, scalable path toward foundation models for relational data, with significant implications for enterprise predictive analytics and rapid deployment across heterogeneous databases.
Abstract
Pretrained transformers readily adapt to new sequence modeling tasks via zero-shot prompting, but relational domains still lack architectures that transfer across datasets and tasks. The core challenge is the diversity of relational data, with varying heterogeneous schemas, graph structures and functional dependencies. In this paper, we present the Relational Transformer (RT) architecture, which can be pretrained on diverse relational databases and directly applied to unseen datasets and tasks without task- or dataset-specific fine-tuning, or retrieval of in-context examples. RT (i) tokenizes cells with table/column metadata, (ii) is pretrained via masked token prediction, and (iii) utilizes a novel Relational Attention mechanism over columns, rows, and primary-foreign key links. Pretrained on RelBench datasets spanning tasks such as churn and sales forecasting, RT attains strong zero-shot performance, averaging 93% of fully supervised AUROC on binary classification tasks with a single forward pass of a 22M parameter model, as opposed to 84% for a 27B LLM. Fine-tuning yields state-of-the-art results with high sample efficiency. Our experiments show that RT's zero-shot transfer harnesses task-table context, relational attention patterns and schema semantics. Overall, RT provides a practical path toward foundation models for relational data.
