Table of Contents
Fetching ...

WaveRoRA: Wavelet Rotary Route Attention for Multivariate Time Series Forecasting

Aobo Liang, Yan Sun, Nadra Guizani

TL;DR

This work proposes a wavelet learning framework that seamlessly integrates wavelet transforms with Transformers to benefit from time and frequency characteristics, and proposes WaveRoRA, a unified model that leverages RoRA capturing inter-series dependencies in the wavelet domain.

Abstract

In recent years, Transformer-based models (Transformers) have achieved significant success in multivariate time series forecasting (MTSF). However, previous works focus on extracting features either from the time domain or the frequency domain, which inadequately captures the trends and periodic characteristics. To address this issue, we propose a wavelet learning framework to model complex temporal dependencies of the time series data. The wavelet domain integrates both time and frequency information, allowing for the analysis of local characteristics of signals at different scales. Additionally, the Softmax self-attention mechanism used by Transformers has quadratic complexity, which leads to excessive computational costs when capturing long-term dependencies. Therefore, we propose a novel attention mechanism: Rotary Route Attention (RoRA). Unlike Softmax attention, RoRA utilizes rotary position embeddings to inject relative positional information to sequence tokens and introduces a small number of routing tokens $r$ to aggregate information from the $KV$ matrices and redistribute it to the $Q$ matrix, offering linear complexity. We further propose WaveRoRA, which leverages RoRA to capture inter-series dependencies in the wavelet domain. We conduct extensive experiments on eight real-world datasets. The results indicate that WaveRoRA outperforms existing state-of-the-art models while maintaining lower computational costs. Our code is available at https://github.com/Leopold2333/WaveRoRA.

WaveRoRA: Wavelet Rotary Route Attention for Multivariate Time Series Forecasting

TL;DR

This work proposes a wavelet learning framework that seamlessly integrates wavelet transforms with Transformers to benefit from time and frequency characteristics, and proposes WaveRoRA, a unified model that leverages RoRA capturing inter-series dependencies in the wavelet domain.

Abstract

In recent years, Transformer-based models (Transformers) have achieved significant success in multivariate time series forecasting (MTSF). However, previous works focus on extracting features either from the time domain or the frequency domain, which inadequately captures the trends and periodic characteristics. To address this issue, we propose a wavelet learning framework to model complex temporal dependencies of the time series data. The wavelet domain integrates both time and frequency information, allowing for the analysis of local characteristics of signals at different scales. Additionally, the Softmax self-attention mechanism used by Transformers has quadratic complexity, which leads to excessive computational costs when capturing long-term dependencies. Therefore, we propose a novel attention mechanism: Rotary Route Attention (RoRA). Unlike Softmax attention, RoRA utilizes rotary position embeddings to inject relative positional information to sequence tokens and introduces a small number of routing tokens to aggregate information from the matrices and redistribute it to the matrix, offering linear complexity. We further propose WaveRoRA, which leverages RoRA to capture inter-series dependencies in the wavelet domain. We conduct extensive experiments on eight real-world datasets. The results indicate that WaveRoRA outperforms existing state-of-the-art models while maintaining lower computational costs. Our code is available at https://github.com/Leopold2333/WaveRoRA.

Paper Structure

This paper contains 30 sections, 14 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Three signals consisting of the same three functions: $\sin x$, $\sin 3x$ and $\sin 8x$ with the sampling rate 20Hz. Signal 1 is defined as $y=\sin x+\sin 3x+\sin 8x$, Signal 2 is evenly divided into 3 segments in the order of $\sin x$, $\sin 3x$ and $\sin 8x$, while signal 3 adjusts the order to $\sin 8x$, $\sin x$ and $\sin 3x$. These signals are different in the time domain but share similar patterns in the frequency domain. The rightmost column shows the wavelet coefficients obtained from a three-level DWT. Compared to the DFT results, the multi-layer wavelet coefficients preserve different periodic characteristics while better revealing the intervals where different periodic patterns are dominant.
  • Figure 2: The architecture of WaveRoRA. The input MTS data is first stabilized by instance normalization and then transformed to multi-scale wavelet coefficients through $J$-level DWT. Each series of coefficients is passed through a corresponding wave embedding layer to generate uniformed wavelet-wise tokens. These embeddings are then transposed to series-wise tokens and fed into $N$-layer WaveRoRA Encoders to capture inter-series dependencies. Subsequently, the outputs are input to a set of wave predictors to obtain the predicted wavelet coefficients, which are then supplied to the IDWT to generate the final predictions.
  • Figure 3: The architecture of RoRA.
  • Figure 4: The MSE results with prediction length $H=96$ on ETTh1 and ETTh2. Varying settings of DWT decomposition level $J$ and dropout values are applied. The model's prediction accuracy improves and stabilizes with a larger $J$, while different dropout values have a minimal effect on performance.
  • Figure 5: The MSE results with prediction length $H=\{96,192,336,720\}$ on Electricity and ETTh2.
  • ...and 2 more figures