Table of Contents
Fetching ...

tsGT: Stochastic Time Series Modeling With Transformer

Łukasz Kuciński, Witold Drzewakowski, Mateusz Olko, Piotr Kozakowski, Łukasz Maziarka, Marta Emilia Nowakowska, Łukasz Kaiser, Piotr Miłoś

TL;DR

tsGT introduces a stochastic, decoder-only transformer designed for time-series forecasting via digit-tokenization, enabling non-parametric distribution modeling within a general-purpose architecture. The model is trained with a next-token objective and evaluated using rolling-window backtesting to assess predictive distributions through metrics like MAD, RMSE, QL, and CRPS, plus Kupiec-based calibration tests. Across electricity, traffic, ETTm2, and weather datasets, tsGT outperforms state-of-the-art baselines on pointwise errors and distributional metrics, while demonstrating the importance of temporal ordering and not being permutation-invariant. The work also analyzes quantile performance, backtesting results, and input-order sensitivity, and discusses limitations such as memory complexity and the need for improved uncertainty quantification and explainability, pointing to practical implications for scenario analysis and risk assessment in real-world time-series applications.

Abstract

Time series methods are of fundamental importance in virtually any field of science that deals with temporally structured data. Recently, there has been a surge of deterministic transformer models with time series-specific architectural biases. In this paper, we go in a different direction by introducing tsGT, a stochastic time series model built on a general-purpose transformer architecture. We focus on using a well-known and theoretically justified rolling window backtesting and evaluation protocol. We show that tsGT outperforms the state-of-the-art models on MAD and RMSE, and surpasses its stochastic peers on QL and CRPS, on four commonly used datasets. We complement these results with a detailed analysis of tsGT's ability to model the data distribution and predict marginal quantile values.

tsGT: Stochastic Time Series Modeling With Transformer

TL;DR

tsGT introduces a stochastic, decoder-only transformer designed for time-series forecasting via digit-tokenization, enabling non-parametric distribution modeling within a general-purpose architecture. The model is trained with a next-token objective and evaluated using rolling-window backtesting to assess predictive distributions through metrics like MAD, RMSE, QL, and CRPS, plus Kupiec-based calibration tests. Across electricity, traffic, ETTm2, and weather datasets, tsGT outperforms state-of-the-art baselines on pointwise errors and distributional metrics, while demonstrating the importance of temporal ordering and not being permutation-invariant. The work also analyzes quantile performance, backtesting results, and input-order sensitivity, and discusses limitations such as memory complexity and the need for improved uncertainty quantification and explainability, pointing to practical implications for scenario analysis and risk assessment in real-world time-series applications.

Abstract

Time series methods are of fundamental importance in virtually any field of science that deals with temporally structured data. Recently, there has been a surge of deterministic transformer models with time series-specific architectural biases. In this paper, we go in a different direction by introducing tsGT, a stochastic time series model built on a general-purpose transformer architecture. We focus on using a well-known and theoretically justified rolling window backtesting and evaluation protocol. We show that tsGT outperforms the state-of-the-art models on MAD and RMSE, and surpasses its stochastic peers on QL and CRPS, on four commonly used datasets. We complement these results with a detailed analysis of tsGT's ability to model the data distribution and predict marginal quantile values.
Paper Structure (49 sections, 13 equations, 6 figures, 12 tables, 1 algorithm)

This paper contains 49 sections, 13 equations, 6 figures, 12 tables, 1 algorithm.

Figures (6)

  • Figure 1: Color-coded p-values from backtesting tsGT on electricity. Each rectangle represents the results for backtesting tsGT on one of the following levels: $50\%$, $75\%$, and $95\%$. The height of a rectangle equals $H=24$, corresponding to the number of prediction steps, and a width equals to $S=321$, the number of time series in electricity.
  • Figure 2: Visualization of data from each of the four real-life datasets. The first row displays a year's worth of data for a selected series. The second and the third rows zoom in on the suffix of those trajectories.
  • Figure 3: Reconstruction of apple sketches by tsGT. The ground truth images are shown in the top row. In the bottom row, the partial ground truth trajectory used to prompt tsGT is shown in blue, and the reconstructed trajectory is highlighted in orange.
  • Figure 4: Reconstruction of banana sketches by tsGT. The ground truth images are shown in the top row. In the bottom row, the partial ground truth trajectory used to prompt tsGT is shown in blue, and the reconstructed trajectory is highlighted in orange.
  • Figure 5: Reconstruction of castle sketches by tsGT. The ground truth images are shown in the top row. In the bottom row, the partial ground truth trajectory used to prompt tsGT is shown in blue, and the reconstructed trajectory is highlighted in orange.
  • ...and 1 more figures