Table of Contents
Fetching ...

A Survey on Deep Tabular Learning

Shriyank Somvanshi, Subasish Das, Syed Aaqib Javed, Gian Antariksa, Ahmed Hossain

TL;DR

This survey addresses the core problem of learning from tabular data with deep models by tracing the evolution from traditional FCNs and shallow nets to attention-based, hybrid, and graph-informed architectures. It surveys key methods (TabNet, SAINT, TabTransformer, FT-Transformer, DeepGBM, DANets, TaBERT, GNN4TDL, and related innovations), highlighting how attention, embeddings, and hybrid designs improve handling of heterogeneous features, non-spatial relations, and limited data. The work also covers training strategies (data augmentation, cross-validation, transfer learning) and future directions in explainability and self-supervised learning, emphasizing practical impacts in healthcare, finance, and transportation. Overall, the paper underscores progress toward scalable, interpretable, and robust tabular deep learning, while calling for comprehensive evaluation across datasets and domains to solidify best practices and deployment readiness.

Abstract

Tabular data, widely used in industries like healthcare, finance, and transportation, presents unique challenges for deep learning due to its heterogeneous nature and lack of spatial structure. This survey reviews the evolution of deep learning models for tabular data, from early fully connected networks (FCNs) to advanced architectures like TabNet, SAINT, TabTranSELU, and MambaNet. These models incorporate attention mechanisms, feature embeddings, and hybrid architectures to address tabular data complexities. TabNet uses sequential attention for instance-wise feature selection, improving interpretability, while SAINT combines self-attention and intersample attention to capture complex interactions across features and data points, both advancing scalability and reducing computational overhead. Hybrid architectures such as TabTransformer and FT-Transformer integrate attention mechanisms with multi-layer perceptrons (MLPs) to handle categorical and numerical data, with FT-Transformer adapting transformers for tabular datasets. Research continues to balance performance and efficiency for large datasets. Graph-based models like GNN4TDL and GANDALF combine neural networks with decision trees or graph structures, enhancing feature representation and mitigating overfitting in small datasets through advanced regularization techniques. Diffusion-based models like the Tabular Denoising Diffusion Probabilistic Model (TabDDPM) generate synthetic data to address data scarcity, improving model robustness. Similarly, models like TabPFN and Ptab leverage pre-trained language models, incorporating transfer learning and self-supervised techniques into tabular tasks. This survey highlights key advancements and outlines future research directions on scalability, generalization, and interpretability in diverse tabular data applications.

A Survey on Deep Tabular Learning

TL;DR

This survey addresses the core problem of learning from tabular data with deep models by tracing the evolution from traditional FCNs and shallow nets to attention-based, hybrid, and graph-informed architectures. It surveys key methods (TabNet, SAINT, TabTransformer, FT-Transformer, DeepGBM, DANets, TaBERT, GNN4TDL, and related innovations), highlighting how attention, embeddings, and hybrid designs improve handling of heterogeneous features, non-spatial relations, and limited data. The work also covers training strategies (data augmentation, cross-validation, transfer learning) and future directions in explainability and self-supervised learning, emphasizing practical impacts in healthcare, finance, and transportation. Overall, the paper underscores progress toward scalable, interpretable, and robust tabular deep learning, while calling for comprehensive evaluation across datasets and domains to solidify best practices and deployment readiness.

Abstract

Tabular data, widely used in industries like healthcare, finance, and transportation, presents unique challenges for deep learning due to its heterogeneous nature and lack of spatial structure. This survey reviews the evolution of deep learning models for tabular data, from early fully connected networks (FCNs) to advanced architectures like TabNet, SAINT, TabTranSELU, and MambaNet. These models incorporate attention mechanisms, feature embeddings, and hybrid architectures to address tabular data complexities. TabNet uses sequential attention for instance-wise feature selection, improving interpretability, while SAINT combines self-attention and intersample attention to capture complex interactions across features and data points, both advancing scalability and reducing computational overhead. Hybrid architectures such as TabTransformer and FT-Transformer integrate attention mechanisms with multi-layer perceptrons (MLPs) to handle categorical and numerical data, with FT-Transformer adapting transformers for tabular datasets. Research continues to balance performance and efficiency for large datasets. Graph-based models like GNN4TDL and GANDALF combine neural networks with decision trees or graph structures, enhancing feature representation and mitigating overfitting in small datasets through advanced regularization techniques. Diffusion-based models like the Tabular Denoising Diffusion Probabilistic Model (TabDDPM) generate synthetic data to address data scarcity, improving model robustness. Similarly, models like TabPFN and Ptab leverage pre-trained language models, incorporating transfer learning and self-supervised techniques into tabular tasks. This survey highlights key advancements and outlines future research directions on scalability, generalization, and interpretability in diverse tabular data applications.

Paper Structure

This paper contains 32 sections, 18 figures.

Figures (18)

  • Figure 1: Progression of Tabular Deep Learning Models
  • Figure 2: An Illustration of 1D (left) and 2D (right) Tabular Dataset
  • Figure 3: Demonstration of TransTab Tasks wang2022transtab
  • Figure 4: TransTab Framework wang2022transtab
  • Figure 5: Transfer Learning in Pre-Trained-TLCNN zhao2017research
  • ...and 13 more figures