Timer-XL: Long-Context Transformers for Unified Time Series Forecasting
Yong Liu, Guo Qin, Xiangdong Huang, Jianmin Wang, Mingsheng Long
TL;DR
Timer-XL addresses the context bottleneck in time series forecasting by introducing a decoder-only Transformer that treats forecasting as multivariate next-token prediction. It introduces TimeAttention with a Kronecker-based masking scheme and RoPE temporal embeddings to capture fine-grained intra- and inter-series dependencies, enabling long-context forecasting with thousands of patch tokens ($NT$). The approach yields state-of-the-art results across univariate, multivariate, and covariate-informed benchmarks, and demonstrates strong zero-shot performance after large-scale pre-training, highlighting its potential as a foundation model for time series. The work provides practical mechanisms for incorporating covariates and exogenous variables while maintaining causality, offering scalable, one-for-all forecasting capabilities with broad applicability in real-world domains.
Abstract
We present Timer-XL, a causal Transformer for unified time series forecasting. To uniformly predict multidimensional time series, we generalize next token prediction, predominantly adopted for 1D token sequences, to multivariate next token prediction. The paradigm formulates various forecasting tasks as a long-context prediction problem. We opt for decoder-only Transformers that capture causal dependencies from varying-length contexts for unified forecasting, making predictions on non-stationary univariate time series, multivariate series with complicated dynamics and correlations, as well as covariate-informed contexts that include exogenous variables. Technically, we propose a universal TimeAttention to capture fine-grained intra- and inter-series dependencies of flattened time series tokens (patches), which is further enhanced by deft position embedding for temporal causality and variable equivalence. Timer-XL achieves state-of-the-art performance across task-specific forecasting benchmarks through a unified approach. Based on large-scale pre-training, Timer-XL achieves state-of-the-art zero-shot performance, making it a promising architecture for pre-trained time series models. Code is available at this repository: https://github.com/thuml/Timer-XL.
