Fine-grained Attention in Hierarchical Transformers for Tabular Time-series
Raphael Azorin, Zied Ben Houidi, Massimo Gallo, Alessandro Finamore, Pietro Michiardi
TL;DR
Fieldy tackles the limitation of coarse attention in hierarchical transformers for tabular time-series by introducing fine-grained field-wise attention across both rows and columns in a two-stage architecture. The first stage comprises dedicated row-wise and column-wise Field transformers to contextualize each field, followed by a Final transformer that models inter-field relations, yielding richer representations. Experiments on Pollution (regression) and Loan default (classification) show Fieldy outperforms state-of-the-art row-based, column-based, and single-stage baselines under equal parameter budgets, with notable gains on Pollution. The work demonstrates the practical value of fine-grained cross-field attention for predicting complex time-dependent tabular data, and provides code and models for reproducibility.
Abstract
Tabular data is ubiquitous in many real-life systems. In particular, time-dependent tabular data, where rows are chronologically related, is typically used for recording historical events, e.g., financial transactions, healthcare records, or stock history. Recently, hierarchical variants of the attention mechanism of transformer architectures have been used to model tabular time-series data. At first, rows (or columns) are encoded separately by computing attention between their fields. Subsequently, encoded rows (or columns) are attended to one another to model the entire tabular time-series. While efficient, this approach constrains the attention granularity and limits its ability to learn patterns at the field-level across separate rows, or columns. We take a first step to address this gap by proposing Fieldy, a fine-grained hierarchical model that contextualizes fields at both the row and column levels. We compare our proposal against state of the art models on regression and classification tasks using public tabular time-series datasets. Our results show that combining row-wise and column-wise attention improves performance without increasing model size. Code and data are available at https://github.com/raphaaal/fieldy.
