Table of Contents
Fetching ...

Only the Curve Shape Matters: Training Foundation Models for Zero-Shot Multivariate Time Series Forecasting through Next Curve Shape Prediction

Cheng Feng, Long Huang, Denis Krompass

TL;DR

This work introduces General Time Transformer (GTT), an encoder-only foundation model pretrained on a large, diverse time-series corpus to enable zero-shot multivariate forecasting. Forecasting is reframed as channel-wise next-curve-shape prediction using fixed-size curve patches, with a dual temporal and cross-channel attention mechanism and a RevIN-based inference workflow. GTT achieves strong zero-shot performance across benchmark datasets, often rivaling or surpassing supervised baselines, and exhibits favorable scaling behavior with model size and pretraining data. The approach highlights the potential of encoder-only Transformer architectures as scalable foundation models for time-series forecasting with practical implications for cross-domain deployment and fine-tuning efficiency.

Abstract

We present General Time Transformer (GTT), an encoder-only style foundation model for zero-shot multivariate time series forecasting. GTT is pretrained on a large dataset of 200M high-quality time series samples spanning diverse domains. In our proposed framework, the task of multivariate time series forecasting is formulated as a channel-wise next curve shape prediction problem, where each time series sample is represented as a sequence of non-overlapping curve shapes with a unified numerical magnitude. GTT is trained to predict the next curve shape based on a window of past curve shapes in a channel-wise manner. Experimental results demonstrate that GTT exhibits superior zero-shot multivariate forecasting capabilities on unseen time series datasets, even surpassing state-of-the-art supervised baselines. Additionally, we investigate the impact of varying GTT model parameters and training dataset scales, observing that the scaling law also holds in the context of zero-shot multivariate time series forecasting.

Only the Curve Shape Matters: Training Foundation Models for Zero-Shot Multivariate Time Series Forecasting through Next Curve Shape Prediction

TL;DR

This work introduces General Time Transformer (GTT), an encoder-only foundation model pretrained on a large, diverse time-series corpus to enable zero-shot multivariate forecasting. Forecasting is reframed as channel-wise next-curve-shape prediction using fixed-size curve patches, with a dual temporal and cross-channel attention mechanism and a RevIN-based inference workflow. GTT achieves strong zero-shot performance across benchmark datasets, often rivaling or surpassing supervised baselines, and exhibits favorable scaling behavior with model size and pretraining data. The approach highlights the potential of encoder-only Transformer architectures as scalable foundation models for time-series forecasting with practical implications for cross-domain deployment and fine-tuning efficiency.

Abstract

We present General Time Transformer (GTT), an encoder-only style foundation model for zero-shot multivariate time series forecasting. GTT is pretrained on a large dataset of 200M high-quality time series samples spanning diverse domains. In our proposed framework, the task of multivariate time series forecasting is formulated as a channel-wise next curve shape prediction problem, where each time series sample is represented as a sequence of non-overlapping curve shapes with a unified numerical magnitude. GTT is trained to predict the next curve shape based on a window of past curve shapes in a channel-wise manner. Experimental results demonstrate that GTT exhibits superior zero-shot multivariate forecasting capabilities on unseen time series datasets, even surpassing state-of-the-art supervised baselines. Additionally, we investigate the impact of varying GTT model parameters and training dataset scales, observing that the scaling law also holds in the context of zero-shot multivariate time series forecasting.
Paper Structure (24 sections, 5 equations, 6 figures, 8 tables)

This paper contains 24 sections, 5 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Model overview. We split an input multivariate time series into fixed-size non-overlapping patches (curve shapes) channel-wise, linearly embed each of them, add position encodings, and feed the resulting sequence of patches to the encoder. The encoder has an extra channel attention stage compared with the standard Transformer, the temporal and channel attention share the same weights. We add a linear head to the last token to perform forecasting. During inference, we add RevIN layers to normalize and denormalize time series channels and pad zeros in front of time series samples with less than 1024 time points.
  • Figure 2: Illustration of Temporal and Channel Attention
  • Figure 3: Zero-shot multivariate forecasting performance on benchmark datasets of GTT with different model parameter scales. The results for ETT are averaged from the four ETT datasets.
  • Figure 4: Zero-shot multivariate forecasting performance on benchmark datasets of GTT-Large with different training data scales. The results for ETT are averaged from the four ETT datasets.
  • Figure 5: Zero-shot forecast of last 24 months' values in Air Passenger dataset produced by GTT-Tiny (left), GTT-Small (mid), and GTT-Large (right). We observe that with a larger model, the accelerated increasing trend can be better captured.
  • ...and 1 more figures