Table of Contents
Fetching ...

Introducing Spectral Attention for Long-Range Dependency in Time Series Forecasting

Bong Gyun Kang, Dongjun Lee, HyunGi Kim, DoHyun Chung, Sungroh Yoon

TL;DR

This work introduces a fast and effective Spectral Attention mechanism, which preserves temporal correlations among samples and facilitates the handling of long-range information while maintaining the base model structure.

Abstract

Sequence modeling faces challenges in capturing long-range dependencies across diverse tasks. Recent linear and transformer-based forecasters have shown superior performance in time series forecasting. However, they are constrained by their inherent inability to effectively address long-range dependencies in time series data, primarily due to using fixed-size inputs for prediction. Furthermore, they typically sacrifice essential temporal correlation among consecutive training samples by shuffling them into mini-batches. To overcome these limitations, we introduce a fast and effective Spectral Attention mechanism, which preserves temporal correlations among samples and facilitates the handling of long-range information while maintaining the base model structure. Spectral Attention preserves long-period trends through a low-pass filter and facilitates gradient to flow between samples. Spectral Attention can be seamlessly integrated into most sequence models, allowing models with fixed-sized look-back windows to capture long-range dependencies over thousands of steps. Through extensive experiments on 11 real-world time series datasets using 7 recent forecasting models, we consistently demonstrate the efficacy of our Spectral Attention mechanism, achieving state-of-the-art results.

Introducing Spectral Attention for Long-Range Dependency in Time Series Forecasting

TL;DR

This work introduces a fast and effective Spectral Attention mechanism, which preserves temporal correlations among samples and facilitates the handling of long-range information while maintaining the base model structure.

Abstract

Sequence modeling faces challenges in capturing long-range dependencies across diverse tasks. Recent linear and transformer-based forecasters have shown superior performance in time series forecasting. However, they are constrained by their inherent inability to effectively address long-range dependencies in time series data, primarily due to using fixed-size inputs for prediction. Furthermore, they typically sacrifice essential temporal correlation among consecutive training samples by shuffling them into mini-batches. To overcome these limitations, we introduce a fast and effective Spectral Attention mechanism, which preserves temporal correlations among samples and facilitates the handling of long-range information while maintaining the base model structure. Spectral Attention preserves long-period trends through a low-pass filter and facilitates gradient to flow between samples. Spectral Attention can be seamlessly integrated into most sequence models, allowing models with fixed-sized look-back windows to capture long-range dependencies over thousands of steps. Through extensive experiments on 11 real-world time series datasets using 7 recent forecasting models, we consistently demonstrate the efficacy of our Spectral Attention mechanism, achieving state-of-the-art results.

Paper Structure

This paper contains 26 sections, 7 equations, 17 figures, 12 tables, 1 algorithm.

Figures (17)

  • Figure 1: (a) Training data are sampled for each time step from continuous sequences, exhibiting high temporal correlations. (b) Conventional approaches simply ignore this temporal information with a shuffled batch. (c) We address the temporal correlation between the samples for the first time, enabling the model to consider long-range dependencies that surpass the look-back window.
  • Figure 2: (a) Plug-in Spectral Attention (SA) module takes a subset of intermediate feature $F$ and returns $F'$ with long-range information beyond the look-back window. The model is trained end-to-end, and gradients flow through the SA module. (b) To capture the long-range dependency, SA stores momentums of feature $F$ generated from the sequential inputs. Multiple momentum parameters $\alpha_i$ capture dependencies across various ranges. (c) SA module computes $F'$ by attending multiple low-frequency ($M^{\alpha_i}$) and high-frequency ($F-M^{\alpha_i}$) components and feature ($F$) using learnable Spectral Attention Matrix (SA-Matrix)
  • Figure 3: BSA module takes a sequentially-sampled mini batch $\left\{X_t,...X_{t+B-1} \right\}$ and computes the corresponding EMA momentums $\left\{M_t,...M_{t+B-1} \right\}$ over time. This is done via single matrix multiplication enabling parallelization. We made the momentum parameter $\alpha_i$ learnable, allowing the model to directly learn the periodicity of the information essential for the future prediction.
  • Figure 4: This figure illustrates the analysis of the SA-matrix of the DLinear model trained on the 720-step prediction task for the Weather and ETTh1 datasets. Panel (a) shows the heatmap of the SA-matrix, and (b)-(d) show the attention and FFT graphs.
  • Figure 5: Results of the iTransformer model on synthetic (a) ETTh1 and (b) ETTh2 datasets. The x-axis is the prediction length (96, 192, 336, 720), and the y-axis is the performance improvement (%) compared to the base model. Each color represents the different periods of the sine wave added to the natural data. 0 indicates original data and serves as the baseline.
  • ...and 12 more figures