Table of Contents
Fetching ...

WDformer: A Wavelet-based Differential Transformer Model for Time Series Forecasting

Xiaojian Wang, Chaoli Zhang, Zhonglong Zheng, Yunliang Jiang

TL;DR

WDformer tackles the challenge of leveraging both time-domain and frequency-domain information for multivariate time series forecasting by applying a multi-level Discrete Wavelet Transform to extract cross-scale features, followed by a wavelet-aware embedding and a differential attention Transformer. The approach reduces attention noise through a differential attention mechanism and reconstructs forecasts via inverse wavelet transformation, enabling robust long-horizon predictions. Empirical results on eight real-world datasets show state-of-the-art performance on several benchmarks, with ablations confirming the positive contribution of both wavelet embedding and differential attention. The method offers practical impact by improving forecast accuracy in diverse domains while maintaining scalable complexity, and code is publicly available for reproducibility.

Abstract

Time series forecasting has various applications, such as meteorological rainfall prediction, traffic flow analysis, financial forecasting, and operational load monitoring for various systems. Due to the sparsity of time series data, relying solely on time-domain or frequency-domain modeling limits the model's ability to fully leverage multi-domain information. Moreover, when applied to time series forecasting tasks, traditional attention mechanisms tend to over-focus on irrelevant historical information, which may introduce noise into the prediction process, leading to biased results. We proposed WDformer, a wavelet-based differential Transformer model. This study employs the wavelet transform to conduct a multi-resolution analysis of time series data. By leveraging the advantages of joint representation in the time-frequency domain, it accurately extracts the key information components that reflect the essential characteristics of the data. Furthermore, we apply attention mechanisms on inverted dimensions, allowing the attention mechanism to capture relationships between multiple variables. When performing attention calculations, we introduced the differential attention mechanism, which computes the attention score by taking the difference between two separate softmax attention matrices. This approach enables the model to focus more on important information and reduce noise. WDformer has achieved state-of-the-art (SOTA) results on multiple challenging real-world datasets, demonstrating its accuracy and effectiveness. Code is available at https://github.com/xiaowangbc/WDformer.

WDformer: A Wavelet-based Differential Transformer Model for Time Series Forecasting

TL;DR

WDformer tackles the challenge of leveraging both time-domain and frequency-domain information for multivariate time series forecasting by applying a multi-level Discrete Wavelet Transform to extract cross-scale features, followed by a wavelet-aware embedding and a differential attention Transformer. The approach reduces attention noise through a differential attention mechanism and reconstructs forecasts via inverse wavelet transformation, enabling robust long-horizon predictions. Empirical results on eight real-world datasets show state-of-the-art performance on several benchmarks, with ablations confirming the positive contribution of both wavelet embedding and differential attention. The method offers practical impact by improving forecast accuracy in diverse domains while maintaining scalable complexity, and code is publicly available for reproducibility.

Abstract

Time series forecasting has various applications, such as meteorological rainfall prediction, traffic flow analysis, financial forecasting, and operational load monitoring for various systems. Due to the sparsity of time series data, relying solely on time-domain or frequency-domain modeling limits the model's ability to fully leverage multi-domain information. Moreover, when applied to time series forecasting tasks, traditional attention mechanisms tend to over-focus on irrelevant historical information, which may introduce noise into the prediction process, leading to biased results. We proposed WDformer, a wavelet-based differential Transformer model. This study employs the wavelet transform to conduct a multi-resolution analysis of time series data. By leveraging the advantages of joint representation in the time-frequency domain, it accurately extracts the key information components that reflect the essential characteristics of the data. Furthermore, we apply attention mechanisms on inverted dimensions, allowing the attention mechanism to capture relationships between multiple variables. When performing attention calculations, we introduced the differential attention mechanism, which computes the attention score by taking the difference between two separate softmax attention matrices. This approach enables the model to focus more on important information and reduce noise. WDformer has achieved state-of-the-art (SOTA) results on multiple challenging real-world datasets, demonstrating its accuracy and effectiveness. Code is available at https://github.com/xiaowangbc/WDformer.

Paper Structure

This paper contains 18 sections, 11 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The overall architecture of WDformer adopts an Encoder-only structure. (a) Wavelet transform is performed on the original time series data to extract features at different frequencies. Subsequently, each obtained wavelet coefficient undergoes independent embedding processing, and these embedding results are concatenated. (b) The differential attention mechanism uses the difference between two separate $\mathrm{softmax}$ attention matrices as the attention score. (c) The output results are reasonably segmented according to the length, which allows for the Inverse Discrete Wavelet Transform (IDWT). Subsequently, the IDWT technique is employed to reconstruct the predicted results of the time series data.
  • Figure 2: Visualization of Time Series in the ETTh1 and ETTh2 Datasets
  • Figure 3: The forecasting results achieved by iTransformer and WDformer are based on the real data from the ECL dataset in the real world. The prediction results on the left are from iTransformer, while the prediction results on the right are from WDformer. The forecast results on the left and right sides are a comparative display for the same time period.
  • Figure 4: The forecasting results achieved by iTransformer and WDformer are based on the real data from the Exchange dataset and Weather dataset in the real world. Subfigures \ref{['Weather:sub1']} and \ref{['Weather:sub2']} show the prediction results of the Weather dataset. Subfigures \ref{['Exchange:sub1']} and \ref{['Exchange:sub2']} show the prediction results of the Exchange dataset. The prediction results on the left are from iTransformer, while the prediction results on the right are from WDformer. Subfigures \ref{['Weather:sub1']} and \ref{['Exchange:sub1']} are from iTransformer, while Subfigures \ref{['Weather:sub2']} and \ref{['Exchange:sub2']} are from WDformer The forecast results are a comparative display for the same time period.