Table of Contents
Fetching ...

Pre-trained Forecasting Models: Strong Zero-Shot Feature Extractors for Time Series Classification

Andreas Auer, Daniel Klotz, Sebastinan Böck, Sepp Hochreiter

TL;DR

This paper questions whether time series forecasting pre-training yields generalizable representations for time series classification. It proposes a zero-shot pipeline using a frozen forecasting encoder to extract embeddings, with a simple classifier on top, and investigates layer/sequence aggregation plus two augmentations to improve transferability. The findings show strong zero-shot performance of forecasting models, sometimes surpassing classification-focused pre-training, and reveal a positive link between forecasting quality and classification performance. The results advocate learning-to-forecast as a viable route to general-purpose time series foundation models.

Abstract

Recent research on time series foundation models has primarily focused on forecasting, leaving it unclear how generalizable their learned representations are. In this study, we examine whether frozen pre-trained forecasting models can provide effective representations for classification. To this end, we compare different representation extraction strategies and introduce two model-agnostic embedding augmentations. Our experiments show that the best forecasting models achieve classification accuracy that matches or even surpasses that of state-of-the-art models pre-trained specifically for classification. Moreover, we observe a positive correlation between forecasting and classification performance. These findings challenge the assumption that task-specific pre-training is necessary, and suggest that learning to forecast may provide a powerful route toward constructing general-purpose time series foundation models.

Pre-trained Forecasting Models: Strong Zero-Shot Feature Extractors for Time Series Classification

TL;DR

This paper questions whether time series forecasting pre-training yields generalizable representations for time series classification. It proposes a zero-shot pipeline using a frozen forecasting encoder to extract embeddings, with a simple classifier on top, and investigates layer/sequence aggregation plus two augmentations to improve transferability. The findings show strong zero-shot performance of forecasting models, sometimes surpassing classification-focused pre-training, and reveal a positive link between forecasting quality and classification performance. The results advocate learning-to-forecast as a viable route to general-purpose time series foundation models.

Abstract

Recent research on time series foundation models has primarily focused on forecasting, leaving it unclear how generalizable their learned representations are. In this study, we examine whether frozen pre-trained forecasting models can provide effective representations for classification. To this end, we compare different representation extraction strategies and introduce two model-agnostic embedding augmentations. Our experiments show that the best forecasting models achieve classification accuracy that matches or even surpasses that of state-of-the-art models pre-trained specifically for classification. Moreover, we observe a positive correlation between forecasting and classification performance. These findings challenge the assumption that task-specific pre-training is necessary, and suggest that learning to forecast may provide a powerful route toward constructing general-purpose time series foundation models.

Paper Structure

This paper contains 36 sections, 30 figures, 11 tables.

Figures (30)

  • Figure 1: Classification accuracy of different models for the univariate, multivariate, and combined benchmark (Random Forest). "Stat+Diff" shows results with both proposed augmentations applied; "no Aug" utilizes the pure forecasting model representations. "ZS" indicates models that did not have access to the benchmarks training data during pre-training.
  • Figure 2: Classification accuracy versus forecasting performance (CRPS on GiftEval) of the evaluated models. The trend (red line) shows that better forecasting ability (lower CRPS) relates to higher classification accuracy.
  • Figure 3: Results for TiRex for the layer and sequence aggregation ablation experiments. (a) Average accuracy on univariate (Uni), multivariate (Multi), and overall (Comb) benchmark datasets. Sorted by overall accuracy. (b) Critical difference diagram of the average accuracy ranks.
  • Figure 4: Results for Moirai 1.1 (Base) for the layer and sequence aggregation ablation experiments. (a) Average accuracy on univariate (Uni), multivariate (Multi), and overall (Comb) benchmark datasets. Sorted by overall accuracy. (b) Critical difference diagram of the average accuracy ranks.
  • Figure 5: Results for TimesFM 2.0 for the layer and sequence aggregation ablation experiments. (a) Average accuracy on univariate (Uni), multivariate (Multi), and overall (Comb) benchmark datasets. Sorted by overall accuracy. (b) Critical difference diagram of the average accuracy ranks.
  • ...and 25 more figures