Table of Contents
Fetching ...

Sparse Transformer with Local and Seasonal Adaptation for Multivariate Time Series Forecasting

Yifan Zhang, Rui Wu, Sergiu M. Dascalu, Frederick C. Harris

TL;DR

This work tackles the inefficiency and limited scalability of full attention in multivariate time series forecasting by introducing Dozer Attention, a horizon-aware sparse mechanism composed of Local, Stride, and Vary components. Integrated into the Dozerformer framework, it decomposes inputs into seasonal and trend parts and uses patch-based embeddings to forecast with an encoder–decoder, achieving strong accuracy while dramatically reducing query–key computations. Empirical results on nine benchmarks show Dozerformer outperforms recent state-of-the-art methods and offers substantial efficiency gains, with ablations validating the contribution of each Dozer component. The approach enables scalable, horizon-adaptive forecasting for diverse MTS data, with practical implications for real-world time series prediction tasks.

Abstract

Transformers have achieved remarkable performance in multivariate time series(MTS) forecasting due to their capability to capture long-term dependencies. However, the canonical attention mechanism has two key limitations: (1) its quadratic time complexity limits the sequence length, and (2) it generates future values from the entire historical sequence. To address this, we propose a Dozer Attention mechanism consisting of three sparse components: (1) Local, each query exclusively attends to keys within a localized window of neighboring time steps. (2) Stride, enables each query to attend to keys at predefined intervals. (3) Vary, allows queries to selectively attend to keys from a subset of the historical sequence. Notably, the size of this subset dynamically expands as forecasting horizons extend. Those three components are designed to capture essential attributes of MTS data, including locality, seasonality, and global temporal dependencies. Additionally, we present the Dozerformer Framework, incorporating the Dozer Attention mechanism for the MTS forecasting task. We evaluated the proposed Dozerformer framework with recent state-of-the-art methods on nine benchmark datasets and confirmed its superior performance. The experimental results indicate that excluding a subset of historical time steps from the time series forecasting process does not compromise accuracy while significantly improving efficiency. Code is available at https://github.com/GRYGY1215/Dozerformer.

Sparse Transformer with Local and Seasonal Adaptation for Multivariate Time Series Forecasting

TL;DR

This work tackles the inefficiency and limited scalability of full attention in multivariate time series forecasting by introducing Dozer Attention, a horizon-aware sparse mechanism composed of Local, Stride, and Vary components. Integrated into the Dozerformer framework, it decomposes inputs into seasonal and trend parts and uses patch-based embeddings to forecast with an encoder–decoder, achieving strong accuracy while dramatically reducing query–key computations. Empirical results on nine benchmarks show Dozerformer outperforms recent state-of-the-art methods and offers substantial efficiency gains, with ablations validating the contribution of each Dozer component. The approach enables scalable, horizon-adaptive forecasting for diverse MTS data, with practical implications for real-world time series prediction tasks.

Abstract

Transformers have achieved remarkable performance in multivariate time series(MTS) forecasting due to their capability to capture long-term dependencies. However, the canonical attention mechanism has two key limitations: (1) its quadratic time complexity limits the sequence length, and (2) it generates future values from the entire historical sequence. To address this, we propose a Dozer Attention mechanism consisting of three sparse components: (1) Local, each query exclusively attends to keys within a localized window of neighboring time steps. (2) Stride, enables each query to attend to keys at predefined intervals. (3) Vary, allows queries to selectively attend to keys from a subset of the historical sequence. Notably, the size of this subset dynamically expands as forecasting horizons extend. Those three components are designed to capture essential attributes of MTS data, including locality, seasonality, and global temporal dependencies. Additionally, we present the Dozerformer Framework, incorporating the Dozer Attention mechanism for the MTS forecasting task. We evaluated the proposed Dozerformer framework with recent state-of-the-art methods on nine benchmark datasets and confirmed its superior performance. The experimental results indicate that excluding a subset of historical time steps from the time series forecasting process does not compromise accuracy while significantly improving efficiency. Code is available at https://github.com/GRYGY1215/Dozerformer.
Paper Structure (25 sections, 11 equations, 7 figures, 7 tables)

This paper contains 25 sections, 11 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: (a) The heatmap illustrates the correlation among 168 time steps for the ETTh1, ETTm1, Weather, and Exchange-Rate datasets. (b) Full Attention: Generates predictions using the entire historical sequence. Local Component: Utilizes time steps within a specified window. Stride Component: Utilizes time steps at fixed intervals from the target. Vary Component: Adaptively uses historical time steps as the forecasting horizon increases.
  • Figure 2: The architecture of our proposed Dozerformer framework.
  • Figure 3: The illustration of the Dozer attention mechanism. Upper: The self-attention consists of Local and Stride. Lower: The cross attention consists of Local, Stride, and Vary.
  • Figure 4: The forecasting results in terms of MSE for different Local, Stride, and Vary values at the horizon of 720.
  • Figure 5: The forecasting results in terms of MAE for different look-back window sizes at horizons 96 and 720.
  • ...and 2 more figures