PyTorch Frame: A Modular Framework for Multi-Modal Tabular Learning
Weihua Hu, Yiwen Yuan, Zecheng Zhang, Akihiro Nitta, Kaidi Cao, Vid Kocijan, Jinu Sunil, Jure Leskovec, Matthias Fey
TL;DR
PyTorch Frame tackles the challenge of learning from multi-modal tabular data by introducing Tensor Frame and a modular encoder–combiner–decoder pipeline that maps raw tables into per-column tensors, builds column embeddings, and refines them through column-wise interactions to produce row representations. It enables easy incorporation of external foundation models for text and image modalities and supports end-to-end learning with Graph Neural Networks via PyG for relational data. Empirically, it demonstrates strong gains in tabular tasks involving text and relational data, while remaining competitive on conventional numerical/categorical datasets. The framework offers a flexible, extensible toolkit to accelerate research and deployment of deep tabular learning in real-world multi-modal and relational settings.
Abstract
We present PyTorch Frame, a PyTorch-based framework for deep learning over multi-modal tabular data. PyTorch Frame makes tabular deep learning easy by providing a PyTorch-based data structure to handle complex tabular data, introducing a model abstraction to enable modular implementation of tabular models, and allowing external foundation models to be incorporated to handle complex columns (e.g., LLMs for text columns). We demonstrate the usefulness of PyTorch Frame by implementing diverse tabular models in a modular way, successfully applying these models to complex multi-modal tabular data, and integrating our framework with PyTorch Geometric, a PyTorch library for Graph Neural Networks (GNNs), to perform end-to-end learning over relational databases.
