Table of Contents
Fetching ...

Towards a Relationship-Aware Transformer for Tabular Data

Andrei V. Konstantinov, Valerii A. Zuev, Lev V. Utkin

TL;DR

The paper tackles regression and individualized treatment-effect estimation on tabular data by leveraging external intersample relationships. It introduces two relation-aware approaches: a Nadaraya-Watson kernel regression with a relationship term and a TabRel transformer with Relational Multi-Head Attention, both designed to exploit a relationship matrix among samples. Across synthetic and real-world regression benchmarks and a semi-synthetic IHDP-based treatment-effect task, the relation-aware NW methods often outperform baseline LightGBM variants, while TabRel offers competitive but not universally superior performance and requires known test relationships at training time. The work highlights the potential and limitations of incorporating graph-like intersample dependencies into tabular modeling and outlines directions for more robust integration and future improvements.

Abstract

Deep learning models for tabular data typically do not allow for imposing a graph of external dependencies between samples, which can be useful for accounting for relatedness in tasks such as treatment effect estimation. Graph neural networks only consider adjacent nodes, making them difficult to apply to sparse graphs. This paper proposes several solutions based on a modified attention mechanism, which accounts for possible relationships between data points by adding a term to the attention matrix. Our models are compared with each other and the gradient boosting decision trees in a regression task on synthetic and real-world datasets, as well as in a treatment effect estimation task on the IHDP dataset.

Towards a Relationship-Aware Transformer for Tabular Data

TL;DR

The paper tackles regression and individualized treatment-effect estimation on tabular data by leveraging external intersample relationships. It introduces two relation-aware approaches: a Nadaraya-Watson kernel regression with a relationship term and a TabRel transformer with Relational Multi-Head Attention, both designed to exploit a relationship matrix among samples. Across synthetic and real-world regression benchmarks and a semi-synthetic IHDP-based treatment-effect task, the relation-aware NW methods often outperform baseline LightGBM variants, while TabRel offers competitive but not universally superior performance and requires known test relationships at training time. The work highlights the potential and limitations of incorporating graph-like intersample dependencies into tabular modeling and outlines directions for more robust integration and future improvements.

Abstract

Deep learning models for tabular data typically do not allow for imposing a graph of external dependencies between samples, which can be useful for accounting for relatedness in tasks such as treatment effect estimation. Graph neural networks only consider adjacent nodes, making them difficult to apply to sparse graphs. This paper proposes several solutions based on a modified attention mechanism, which accounts for possible relationships between data points by adding a term to the attention matrix. Our models are compared with each other and the gradient boosting decision trees in a regression task on synthetic and real-world datasets, as well as in a treatment effect estimation task on the IHDP dataset.

Paper Structure

This paper contains 31 sections, 9 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Nadaraya-Watson regression on a toy dataset - quadratic dependency + hidden categorical additive term encoded in the relationship matrix (each element equals to 1 for two points of the same category, 0 for different categories). Confidence bands are calculated for 30 independently generated datasets
  • Figure 2: A choropleth for the target variable in the Life Expectancy dataset in 2015. Light gray shade indicates missing values
  • Figure 3: Birds genetic diversity (Birds dataset). Each sector corresponds to an order. Outermost track visualizes the family, second-to-outermost -- genus (same color within each order means the same family or genus). Orders with few species present in the dataset (<9) are not shown
  • Figure 4: Empirical PDFs of differences between target variables in pairs with and without the relationship for Life Expectancy and Birds datasets
  • Figure 5: Relational Multi-Head Attention: attention map $\alpha$ constructed from the input matrix $X$ and the relationship matrix $R$