Table of Contents
Fetching ...

Tab-PET: Graph-Based Positional Encodings for Tabular Transformers

Yunze Leng, Rohan Ghosh, Mehul Motani

TL;DR

This paper tackles the lack of intrinsic structure in tabular data by introducing Tab-PET, a graph-based framework that generates fixed, Laplacian-eigenvector positional encodings derived from feature graphs estimated via association- or causality-based methods. By concatenating these encodings with standard embeddings, Tab-PET provides a structured inductive bias that reduces the effective rank of transformer embeddings, improving generalization across 50 tabular datasets and multiple transformer backbones. The authors establish theoretical results linking PEs to rank reduction and demonstrate substantial empirical gains, with association-based graph estimates (especially Spearman-based) outperforming causality-based ones and fixed PEs outperforming learnable PEs in low-data regimes. Overall, Tab-PET reveals a practical mechanism to harness data structure in tabular transformers, improving accuracy and robustness while maintaining parameter efficiency.

Abstract

Supervised learning with tabular data presents unique challenges, including low data sizes, the absence of structural cues, and heterogeneous features spanning both categorical and continuous domains. Unlike vision and language tasks, where models can exploit inductive biases in the data, tabular data lacks inherent positional structure, hindering the effectiveness of self-attention mechanisms. While recent transformer-based models like TabTransformer, SAINT, and FT-Transformer (which we refer to as 3T) have shown promise on tabular data, they typically operate without leveraging structural cues such as positional encodings (PEs), as no prior structural information is usually available. In this work, we find both theoretically and empirically that structural cues, specifically PEs can be a useful tool to improve generalization performance for tabular transformers. We find that PEs impart the ability to reduce the effective rank (a form of intrinsic dimensionality) of the features, effectively simplifying the task by reducing the dimensionality of the problem, yielding improved generalization. To that end, we propose Tab-PET (PEs for Tabular Transformers), a graph-based framework for estimating and inculcating PEs into embeddings. Inspired by approaches that derive PEs from graph topology, we explore two paradigms for graph estimation: association-based and causality-based. We empirically demonstrate that graph-derived PEs significantly improve performance across 50 classification and regression datasets for 3T. Notably, association-based graphs consistently yield more stable and pronounced gains compared to causality-driven ones. Our work highlights an unexpected role of PEs in tabular transformers, revealing how they can be harnessed to improve generalization.

Tab-PET: Graph-Based Positional Encodings for Tabular Transformers

TL;DR

This paper tackles the lack of intrinsic structure in tabular data by introducing Tab-PET, a graph-based framework that generates fixed, Laplacian-eigenvector positional encodings derived from feature graphs estimated via association- or causality-based methods. By concatenating these encodings with standard embeddings, Tab-PET provides a structured inductive bias that reduces the effective rank of transformer embeddings, improving generalization across 50 tabular datasets and multiple transformer backbones. The authors establish theoretical results linking PEs to rank reduction and demonstrate substantial empirical gains, with association-based graph estimates (especially Spearman-based) outperforming causality-based ones and fixed PEs outperforming learnable PEs in low-data regimes. Overall, Tab-PET reveals a practical mechanism to harness data structure in tabular transformers, improving accuracy and robustness while maintaining parameter efficiency.

Abstract

Supervised learning with tabular data presents unique challenges, including low data sizes, the absence of structural cues, and heterogeneous features spanning both categorical and continuous domains. Unlike vision and language tasks, where models can exploit inductive biases in the data, tabular data lacks inherent positional structure, hindering the effectiveness of self-attention mechanisms. While recent transformer-based models like TabTransformer, SAINT, and FT-Transformer (which we refer to as 3T) have shown promise on tabular data, they typically operate without leveraging structural cues such as positional encodings (PEs), as no prior structural information is usually available. In this work, we find both theoretically and empirically that structural cues, specifically PEs can be a useful tool to improve generalization performance for tabular transformers. We find that PEs impart the ability to reduce the effective rank (a form of intrinsic dimensionality) of the features, effectively simplifying the task by reducing the dimensionality of the problem, yielding improved generalization. To that end, we propose Tab-PET (PEs for Tabular Transformers), a graph-based framework for estimating and inculcating PEs into embeddings. Inspired by approaches that derive PEs from graph topology, we explore two paradigms for graph estimation: association-based and causality-based. We empirically demonstrate that graph-derived PEs significantly improve performance across 50 classification and regression datasets for 3T. Notably, association-based graphs consistently yield more stable and pronounced gains compared to causality-driven ones. Our work highlights an unexpected role of PEs in tabular transformers, revealing how they can be harnessed to improve generalization.

Paper Structure

This paper contains 76 sections, 4 theorems, 48 equations, 11 figures, 22 tables, 2 algorithms.

Key Result

Theorem 1

[Effective Rank under Random Inputs] Let $x \in \mathbb{R}^d$ be an input vector to a single-layer, single-head FT-Transformer with components $x_i \sim \text{i.i.d.}$ and $x_i \in (0,1)$. Let $d_T$ denote the token dimension (inclusive of concatenated position encodings). Let $q\in \mathbb{R}^{d_T} Then the effective rank $r_{\mathrm{eff}}$ of the CLS token output after self-attention satisfies

Figures (11)

  • Figure 1: Tab-PET framework for integrating PEs in tabular transformers. (a) Categorical features are one-hot encoded and continuous features are normalized. (b) A feature-wise graph is estimated based on intra-feature dependencies, capturing relational structure among dimensions. (c) Graph Laplacian eigenvectors (examples shown) are extracted to form fixed PEs and scaled using the hyperparameter $\alpha$ to emphasize the degree of importance. (d) These encodings are concatenated with standard embeddings and fed into transformer layers.
  • Figure 2: RMSE performance comparison with low, moderate, and high structure synthetic datasets across varying $\alpha$.
  • Figure 3: Graph entropy distributions across five graph estimation approaches. Varying bar widths reflect the different value ranges spanned by each approach.
  • Figure 4: Empirical validation of effective rank reduction. Main plot shows the mean effective rank vs. $\alpha$ for Tab-PET, Random PE, and baseline. Inset shows per-dataset trends.
  • Figure 5: Empirical Validation of Effective Rank Reduction Theory. Left panel displays individual dataset trends across 15 tabular datasets for both Tab-PET (orange solid lines) and Random PE (gray dashed lines) compared to baseline models without PEs (blue scattered points at $\alpha=0$). Right panel shows the mean effective rank across all datasets.
  • ...and 6 more figures

Theorems & Definitions (9)

  • Definition 1: Effective Rank
  • Theorem 1
  • Remark 1
  • Theorem 2
  • Remark 2
  • Theorem 1: Effective Rank under Random Inputs
  • proof
  • Theorem 2: Effective Rank under Structured Inputs
  • proof