Table of Contents
Fetching ...

ElasTST: Towards Robust Varied-Horizon Forecasting with Elastic Time-Series Transformer

Jiawen Zhang, Shun Zheng, Xumeng Wen, Xiaofang Zhou, Jiang Bian, Jia Li

TL;DR

The ElasTST model incorporates a non-autoregressive design with placeholders and structured self-attention masks, warranting future outputs that are invariant to adjustments in inference horizons, and is positioned as a robust solution for the practical necessity of varied-horizon forecasting.

Abstract

Numerous industrial sectors necessitate models capable of providing robust forecasts across various horizons. Despite the recent strides in crafting specific architectures for time-series forecasting and developing pre-trained universal models, a comprehensive examination of their capability in accommodating varied-horizon forecasting during inference is still lacking. This paper bridges this gap through the design and evaluation of the Elastic Time-Series Transformer (ElasTST). The ElasTST model incorporates a non-autoregressive design with placeholders and structured self-attention masks, warranting future outputs that are invariant to adjustments in inference horizons. A tunable version of rotary position embedding is also integrated into ElasTST to capture time-series-specific periods and enhance adaptability to different horizons. Additionally, ElasTST employs a multi-scale patch design, effectively integrating both fine-grained and coarse-grained information. During the training phase, ElasTST uses a horizon reweighting strategy that approximates the effect of random sampling across multiple horizons with a single fixed horizon setting. Through comprehensive experiments and comparisons with state-of-the-art time-series architectures and contemporary foundation models, we demonstrate the efficacy of ElasTST's unique design elements. Our findings position ElasTST as a robust solution for the practical necessity of varied-horizon forecasting.

ElasTST: Towards Robust Varied-Horizon Forecasting with Elastic Time-Series Transformer

TL;DR

The ElasTST model incorporates a non-autoregressive design with placeholders and structured self-attention masks, warranting future outputs that are invariant to adjustments in inference horizons, and is positioned as a robust solution for the practical necessity of varied-horizon forecasting.

Abstract

Numerous industrial sectors necessitate models capable of providing robust forecasts across various horizons. Despite the recent strides in crafting specific architectures for time-series forecasting and developing pre-trained universal models, a comprehensive examination of their capability in accommodating varied-horizon forecasting during inference is still lacking. This paper bridges this gap through the design and evaluation of the Elastic Time-Series Transformer (ElasTST). The ElasTST model incorporates a non-autoregressive design with placeholders and structured self-attention masks, warranting future outputs that are invariant to adjustments in inference horizons. A tunable version of rotary position embedding is also integrated into ElasTST to capture time-series-specific periods and enhance adaptability to different horizons. Additionally, ElasTST employs a multi-scale patch design, effectively integrating both fine-grained and coarse-grained information. During the training phase, ElasTST uses a horizon reweighting strategy that approximates the effect of random sampling across multiple horizons with a single fixed horizon setting. Through comprehensive experiments and comparisons with state-of-the-art time-series architectures and contemporary foundation models, we demonstrate the efficacy of ElasTST's unique design elements. Our findings position ElasTST as a robust solution for the practical necessity of varied-horizon forecasting.

Paper Structure

This paper contains 52 sections, 12 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Overview of the ElasTST Architecture. ElasTST employs (a) structured attention masks for placeholders to ensure consistent outputs across varied forecasting horizons. It incorporates (b) tunable RoPE customized to time series periodicities, enhancing its robustness. The architecture also integrates a (c) multi-scale patch assembly that merges fine-grained and coarse-grained details for improved forecasting accuracy. Furthermore, we implement (d) training horizon reweighting scheme during the training phase, which effectively simulates random sampling of forecasting horizons, reducing the need for additional sampling efforts.
  • Figure 2: Performance of trained once and inference over varying forecasting horizons. Models except TimesFM and MOIRAI are trained with a forecasting horizon of 720 and tasked with predicting across multiple horizons. A vertical red dashed line distinguishes between their seen horizons (96, 192, 336, 720) and unseen horizon (1024). We use a dashed line to denote the datasets on which the model was pre-trained, e.g., both TimesFM and MOIRAI have leveraged Traffic datasets for their pre-training. The ETT encompasses averaged results from datasets ETTh1, ETTh2, ETTm1, and ETTm2. Models lack inherent elasticity use a truncation strategy for shorter forecasts, and the foundation models use their pre-trained checkpoints and recommended configurations for inference.
  • Figure 3: Ablation study for the structured attention masks, tunable RoPE, and multi-patch assembly. A vertical red dashed line indicates the training horizon.
  • Figure 4: Ablation study for designs in position embedding. A vertical red dashed line distinguishes between seen horizons and unseen horizons.
  • Figure 5: Performance of patch size selections. Results are averaged across all datasets and training horizons of $\{96, 192, 336, 720\}$. '8_16_32' represents a multi-patch configuration of $\bm{p}=\{8, 16, 32\}$.
  • ...and 7 more figures