Table of Contents
Fetching ...

One Transformer for All Time Series: Representing and Training with Time-Dependent Heterogeneous Tabular Data

Simone Luetto, Fabrizio Garuti, Enver Sangineto, Lorenzo Forni, Rita Cucchiara

TL;DR

The paper addresses modeling time-dependent, heterogeneous tabular data (mixed numerical and categorical features with variable row structures) by introducing UniTTab, a two-level Transformer with row-type aware embeddings and frequency-based numerical representations. It adopts a BEiT-inspired, uniform Masked Token pre-training objective, including Neighborhood Label Smoothing, to train a single model that handles all feature types. Across five diverse datasets, UniTTab consistently outperforms state-of-the-art tabular-time-series methods and common ML baselines, with larger gains for longer sequences and when leveraging pre-training. The work demonstrates the feasibility and effectiveness of a unified pre-trained foundation model approach for complex tabular data, and points toward scalable deployment in real-world finance and similar domains.

Abstract

There is a recent growing interest in applying Deep Learning techniques to tabular data, in order to replicate the success of other Artificial Intelligence areas in this structured domain. Specifically interesting is the case in which tabular data have a time dependence, such as, for instance financial transactions. However, the heterogeneity of the tabular values, in which categorical elements are mixed with numerical items, makes this adaptation difficult. In this paper we propose a Transformer architecture to represent heterogeneous time-dependent tabular data, in which numerical features are represented using a set of frequency functions and the whole network is uniformly trained with a unique loss function.

One Transformer for All Time Series: Representing and Training with Time-Dependent Heterogeneous Tabular Data

TL;DR

The paper addresses modeling time-dependent, heterogeneous tabular data (mixed numerical and categorical features with variable row structures) by introducing UniTTab, a two-level Transformer with row-type aware embeddings and frequency-based numerical representations. It adopts a BEiT-inspired, uniform Masked Token pre-training objective, including Neighborhood Label Smoothing, to train a single model that handles all feature types. Across five diverse datasets, UniTTab consistently outperforms state-of-the-art tabular-time-series methods and common ML baselines, with larger gains for longer sequences and when leveraging pre-training. The work demonstrates the feasibility and effectiveness of a unified pre-trained foundation model approach for complex tabular data, and points toward scalable deployment in real-world finance and similar domains.

Abstract

There is a recent growing interest in applying Deep Learning techniques to tabular data, in order to replicate the success of other Artificial Intelligence areas in this structured domain. Specifically interesting is the case in which tabular data have a time dependence, such as, for instance financial transactions. However, the heterogeneity of the tabular values, in which categorical elements are mixed with numerical items, makes this adaptation difficult. In this paper we propose a Transformer architecture to represent heterogeneous time-dependent tabular data, in which numerical features are represented using a set of frequency functions and the whole network is uniformly trained with a unique loss function.
Paper Structure (12 sections, 9 equations, 2 figures, 16 tables)

This paper contains 12 sections, 9 equations, 2 figures, 16 tables.

Figures (2)

  • Figure 1: An example of time series taken from the Transaction Dataset TabBERT. Each row is a bank transaction and it is composed of $k = 10$ attributes. This sequence of $t = 10$ temporally consecutive transactions (rows) of the same client is a time series.
  • Figure 2: A schematic comparison between the architectures of TabBERT (a) and UniTTab (b). In both figures, $v_j$ is a numerical vale. Note that in (b) the number of attributes of each row ($k_h$) is variable.