Table of Contents
Fetching ...

Timer: Generative Pre-trained Transformers Are Large Time Series Models

Yong Liu, Haoran Zhang, Chenyu Li, Xiangdong Huang, Jianmin Wang, Mingsheng Long

TL;DR

This work argues that large time-series models can overcome data scarcity by leveraging generative pre-training. It introduces UTSD, a 1B-point dataset, and S3, a unified token format enabling autoregressive training of a decoder-only Transformer (Timer) on diverse time-series data. Timer is pre-trained to predict next tokens and adapted to forecasting, imputation, and anomaly detection within a single framework, demonstrating strong few-shot and zero-shot capabilities. The study highlights scalable data and model infrastructure as key enablers for LTSMs and lays groundwork for broader task generality in time-series analysis.

Abstract

Deep learning has contributed remarkably to the advancement of time series analysis. Still, deep models can encounter performance bottlenecks in real-world data-scarce scenarios, which can be concealed due to the performance saturation with small models on current benchmarks. Meanwhile, large models have demonstrated great powers in these scenarios through large-scale pre-training. Continuous progress has been achieved with the emergence of large language models, exhibiting unprecedented abilities such as few-shot generalization, scalability, and task generality, which are however absent in small deep models. To change the status quo of training scenario-specific small models from scratch, this paper aims at the early development of large time series models (LTSM). During pre-training, we curate large-scale datasets with up to 1 billion time points, unify heterogeneous time series into single-series sequence (S3) format, and develop the GPT-style architecture toward LTSMs. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task. The outcome of this study is a Time Series Transformer (Timer), which is generative pre-trained by next token prediction and adapted to various downstream tasks with promising capabilities as an LTSM. Code and datasets are available at: https://github.com/thuml/Large-Time-Series-Model.

Timer: Generative Pre-trained Transformers Are Large Time Series Models

TL;DR

This work argues that large time-series models can overcome data scarcity by leveraging generative pre-training. It introduces UTSD, a 1B-point dataset, and S3, a unified token format enabling autoregressive training of a decoder-only Transformer (Timer) on diverse time-series data. Timer is pre-trained to predict next tokens and adapted to forecasting, imputation, and anomaly detection within a single framework, demonstrating strong few-shot and zero-shot capabilities. The study highlights scalable data and model infrastructure as key enablers for LTSMs and lays groundwork for broader task generality in time-series analysis.

Abstract

Deep learning has contributed remarkably to the advancement of time series analysis. Still, deep models can encounter performance bottlenecks in real-world data-scarce scenarios, which can be concealed due to the performance saturation with small models on current benchmarks. Meanwhile, large models have demonstrated great powers in these scenarios through large-scale pre-training. Continuous progress has been achieved with the emergence of large language models, exhibiting unprecedented abilities such as few-shot generalization, scalability, and task generality, which are however absent in small deep models. To change the status quo of training scenario-specific small models from scratch, this paper aims at the early development of large time series models (LTSM). During pre-training, we curate large-scale datasets with up to 1 billion time points, unify heterogeneous time series into single-series sequence (S3) format, and develop the GPT-style architecture toward LTSMs. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task. The outcome of this study is a Time Series Transformer (Timer), which is generative pre-trained by next token prediction and adapted to various downstream tasks with promising capabilities as an LTSM. Code and datasets are available at: https://github.com/thuml/Large-Time-Series-Model.
Paper Structure (58 sections, 6 equations, 21 figures, 17 tables)

This paper contains 58 sections, 6 equations, 21 figures, 17 tables.

Figures (21)

  • Figure 1: Performance of PatchTST nie2022time on different data scarcities. The degradation is reported as the relative increase in MSE compared with training on full samples.
  • Figure 2: Illustration of Unified Time Series Dataset (UTSD) that is composed of various time series domains with hierarchical capacities.
  • Figure 3: Pre-training strategy for heterogeneous time series.
  • Figure 4: Architectures of typical Transformer-based forecasters.
  • Figure 5: Illustration of our generative task unification: (1) Generative pre-trained Timer can naturally predict the next series by the iterative autoregression; (2) By introducing masked tokens during adaptation, Timer generates imputations with the previous context and assemble them with the observed part; (3) We propose predictive anomaly detection by predicting normal series in advance.
  • ...and 16 more figures