Towards a Relationship-Aware Transformer for Tabular Data
Andrei V. Konstantinov, Valerii A. Zuev, Lev V. Utkin
TL;DR
The paper tackles regression and individualized treatment-effect estimation on tabular data by leveraging external intersample relationships. It introduces two relation-aware approaches: a Nadaraya-Watson kernel regression with a relationship term and a TabRel transformer with Relational Multi-Head Attention, both designed to exploit a relationship matrix among samples. Across synthetic and real-world regression benchmarks and a semi-synthetic IHDP-based treatment-effect task, the relation-aware NW methods often outperform baseline LightGBM variants, while TabRel offers competitive but not universally superior performance and requires known test relationships at training time. The work highlights the potential and limitations of incorporating graph-like intersample dependencies into tabular modeling and outlines directions for more robust integration and future improvements.
Abstract
Deep learning models for tabular data typically do not allow for imposing a graph of external dependencies between samples, which can be useful for accounting for relatedness in tasks such as treatment effect estimation. Graph neural networks only consider adjacent nodes, making them difficult to apply to sparse graphs. This paper proposes several solutions based on a modified attention mechanism, which accounts for possible relationships between data points by adding a term to the attention matrix. Our models are compared with each other and the gradient boosting decision trees in a regression task on synthetic and real-world datasets, as well as in a treatment effect estimation task on the IHDP dataset.
