Table of Contents
Fetching ...

TabGLM: Tabular Graph Language Model for Learning Transferable Representations Through Multi-Modal Consistency Minimization

Anay Majee, Maria Xenochristou, Wei-Peng Chen

TL;DR

TabGLM addresses heterogeneity in tabular data by learning per-row representations from two modalities: a graph capturing structural relationships and serialized text capturing semantic information. It uses a graph neural network to process G_i and a frozen TAPAS-based text encoder to process T_i^{ser}, with MuCosa aligning the two embeddings via a joint objective $L = (1-\lambda)L_{supervised} + \lambda L_{consistency}$. The approach achieves up to $5.56\%$ AUROC improvement over SoTA tabular DL methods across 25 datasets while using significantly fewer parameters, and ablations confirm the benefits of multi-modal fusion and the TAPAS text encoder. This multi-modal, semi-supervised framework demonstrates strong performance and transferability for heterogeneous tabular datasets, suggesting practical impact in domains requiring robust handling of mixed feature types.

Abstract

Handling heterogeneous data in tabular datasets poses a significant challenge for deep learning models. While attention-based architectures and self-supervised learning have achieved notable success, their application to tabular data remains less effective over linear and tree based models. Although several breakthroughs have been achieved by models which transform tables into uni-modal transformations like image, language and graph, these models often underperform in the presence of feature heterogeneity. To address this gap, we introduce TabGLM (Tabular Graph Language Model), a novel multi-modal architecture designed to model both structural and semantic information from a table. TabGLM transforms each row of a table into a fully connected graph and serialized text, which are then encoded using a graph neural network (GNN) and a text encoder, respectively. By aligning these representations through a joint, multi-modal, self-supervised learning objective, TabGLM leverages complementary information from both modalities, thereby enhancing feature learning. TabGLM's flexible graph-text pipeline efficiently processes heterogeneous datasets with significantly fewer parameters over existing Deep Learning approaches. Evaluations across 25 benchmark datasets demonstrate substantial performance gains, with TabGLM achieving an average AUC-ROC improvement of up to 5.56% over State-of-the-Art (SoTA) tabular learning methods.

TabGLM: Tabular Graph Language Model for Learning Transferable Representations Through Multi-Modal Consistency Minimization

TL;DR

TabGLM addresses heterogeneity in tabular data by learning per-row representations from two modalities: a graph capturing structural relationships and serialized text capturing semantic information. It uses a graph neural network to process G_i and a frozen TAPAS-based text encoder to process T_i^{ser}, with MuCosa aligning the two embeddings via a joint objective . The approach achieves up to AUROC improvement over SoTA tabular DL methods across 25 datasets while using significantly fewer parameters, and ablations confirm the benefits of multi-modal fusion and the TAPAS text encoder. This multi-modal, semi-supervised framework demonstrates strong performance and transferability for heterogeneous tabular datasets, suggesting practical impact in domains requiring robust handling of mixed feature types.

Abstract

Handling heterogeneous data in tabular datasets poses a significant challenge for deep learning models. While attention-based architectures and self-supervised learning have achieved notable success, their application to tabular data remains less effective over linear and tree based models. Although several breakthroughs have been achieved by models which transform tables into uni-modal transformations like image, language and graph, these models often underperform in the presence of feature heterogeneity. To address this gap, we introduce TabGLM (Tabular Graph Language Model), a novel multi-modal architecture designed to model both structural and semantic information from a table. TabGLM transforms each row of a table into a fully connected graph and serialized text, which are then encoded using a graph neural network (GNN) and a text encoder, respectively. By aligning these representations through a joint, multi-modal, self-supervised learning objective, TabGLM leverages complementary information from both modalities, thereby enhancing feature learning. TabGLM's flexible graph-text pipeline efficiently processes heterogeneous datasets with significantly fewer parameters over existing Deep Learning approaches. Evaluations across 25 benchmark datasets demonstrate substantial performance gains, with TabGLM achieving an average AUC-ROC improvement of up to 5.56% over State-of-the-Art (SoTA) tabular learning methods.

Paper Structure

This paper contains 18 sections, 5 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: Semi-Supervised Multi-Modal Tabular Deep Learning in TabGLM. We propose a joint graph-language method that can effectively learn from heterogeneous, real-world tabular datasets by integrating structural and semantic information.
  • Figure 2: Overview of our TabGLM framework. TabGLM introduces Multi-modal Graph-Language Modeling to enable tabular learning on datasets with heterogeneous data types. Our method leverages graph and language embeddings, consistency regularization, and supervised learning to effectively adapt to diverse real-world downstream tasks.
  • Figure 3: Multi-Modal Representation of Tables in TabGLM, depicting the text serialization and graph pre-processing pipelines.