WDformer: A Wavelet-based Differential Transformer Model for Time Series Forecasting
Xiaojian Wang, Chaoli Zhang, Zhonglong Zheng, Yunliang Jiang
TL;DR
WDformer tackles the challenge of leveraging both time-domain and frequency-domain information for multivariate time series forecasting by applying a multi-level Discrete Wavelet Transform to extract cross-scale features, followed by a wavelet-aware embedding and a differential attention Transformer. The approach reduces attention noise through a differential attention mechanism and reconstructs forecasts via inverse wavelet transformation, enabling robust long-horizon predictions. Empirical results on eight real-world datasets show state-of-the-art performance on several benchmarks, with ablations confirming the positive contribution of both wavelet embedding and differential attention. The method offers practical impact by improving forecast accuracy in diverse domains while maintaining scalable complexity, and code is publicly available for reproducibility.
Abstract
Time series forecasting has various applications, such as meteorological rainfall prediction, traffic flow analysis, financial forecasting, and operational load monitoring for various systems. Due to the sparsity of time series data, relying solely on time-domain or frequency-domain modeling limits the model's ability to fully leverage multi-domain information. Moreover, when applied to time series forecasting tasks, traditional attention mechanisms tend to over-focus on irrelevant historical information, which may introduce noise into the prediction process, leading to biased results. We proposed WDformer, a wavelet-based differential Transformer model. This study employs the wavelet transform to conduct a multi-resolution analysis of time series data. By leveraging the advantages of joint representation in the time-frequency domain, it accurately extracts the key information components that reflect the essential characteristics of the data. Furthermore, we apply attention mechanisms on inverted dimensions, allowing the attention mechanism to capture relationships between multiple variables. When performing attention calculations, we introduced the differential attention mechanism, which computes the attention score by taking the difference between two separate softmax attention matrices. This approach enables the model to focus more on important information and reduce noise. WDformer has achieved state-of-the-art (SOTA) results on multiple challenging real-world datasets, demonstrating its accuracy and effectiveness. Code is available at https://github.com/xiaowangbc/WDformer.
