tsGT: Stochastic Time Series Modeling With Transformer
Łukasz Kuciński, Witold Drzewakowski, Mateusz Olko, Piotr Kozakowski, Łukasz Maziarka, Marta Emilia Nowakowska, Łukasz Kaiser, Piotr Miłoś
TL;DR
tsGT introduces a stochastic, decoder-only transformer designed for time-series forecasting via digit-tokenization, enabling non-parametric distribution modeling within a general-purpose architecture. The model is trained with a next-token objective and evaluated using rolling-window backtesting to assess predictive distributions through metrics like MAD, RMSE, QL, and CRPS, plus Kupiec-based calibration tests. Across electricity, traffic, ETTm2, and weather datasets, tsGT outperforms state-of-the-art baselines on pointwise errors and distributional metrics, while demonstrating the importance of temporal ordering and not being permutation-invariant. The work also analyzes quantile performance, backtesting results, and input-order sensitivity, and discusses limitations such as memory complexity and the need for improved uncertainty quantification and explainability, pointing to practical implications for scenario analysis and risk assessment in real-world time-series applications.
Abstract
Time series methods are of fundamental importance in virtually any field of science that deals with temporally structured data. Recently, there has been a surge of deterministic transformer models with time series-specific architectural biases. In this paper, we go in a different direction by introducing tsGT, a stochastic time series model built on a general-purpose transformer architecture. We focus on using a well-known and theoretically justified rolling window backtesting and evaluation protocol. We show that tsGT outperforms the state-of-the-art models on MAD and RMSE, and surpasses its stochastic peers on QL and CRPS, on four commonly used datasets. We complement these results with a detailed analysis of tsGT's ability to model the data distribution and predict marginal quantile values.
