Table of Contents
Fetching ...

DPWMixer: Dual-Path Wavelet Mixer for Long-Term Time Series Forecasting

Li Qianyang, Zhang Xingjun, Wang Shaoxun, Wei Jia

TL;DR

DPWMixer tackles long-term time series forecasting by integrating a Lossless Haar Wavelet Pyramid with a Dual-Path Trend Mixer to separately model macro-trends and micro-dynamics across multiple scales. An Adaptive Multi-Scale Fusion module weights predictions by channel, achieving linear time complexity while preserving high-frequency details. Empirical results across eight benchmarks show state-of-the-art performance and robustness, especially on datasets with strong seasonality or abrupt transients, while maintaining efficiency advantages over Transformer-based models. The work highlights the value of lossless, time-frequency localized decomposition and dual-path modeling for scalable, accurate LTSF, with potential extensions to learnable wavelets and self-supervised pre-training.

Abstract

Long-term time series forecasting (LTSF) is a critical task in computational intelligence. While Transformer-based models effectively capture long-range dependencies, they often suffer from quadratic complexity and overfitting due to data sparsity. Conversely, efficient linear models struggle to depict complex non-linear local dynamics. Furthermore, existing multi-scale frameworks typically rely on average pooling, which acts as a non-ideal low-pass filter, leading to spectral aliasing and the irreversible loss of high-frequency transients. In response, this paper proposes DPWMixer, a computationally efficient Dual-Path architecture. The framework is built upon a Lossless Haar Wavelet Pyramid that replaces traditional pooling, utilizing orthogonal decomposition to explicitly disentangle trends and local fluctuations without information loss. To process these components, we design a Dual-Path Trend Mixer that integrates a global linear mapping for macro-trend anchoring and a flexible patch-based MLP-Mixer for micro-dynamic evolution. Finally, An adaptive multi-scale fusion module then integrates predictions from diverse scales, weighted by channel stationarity to optimize synthesis. Extensive experiments on eight public benchmarks demonstrate that our method achieves a consistent improvement over state-of-the-art baselines. The code is available at https://github.com/hit636/DPWMixer.

DPWMixer: Dual-Path Wavelet Mixer for Long-Term Time Series Forecasting

TL;DR

DPWMixer tackles long-term time series forecasting by integrating a Lossless Haar Wavelet Pyramid with a Dual-Path Trend Mixer to separately model macro-trends and micro-dynamics across multiple scales. An Adaptive Multi-Scale Fusion module weights predictions by channel, achieving linear time complexity while preserving high-frequency details. Empirical results across eight benchmarks show state-of-the-art performance and robustness, especially on datasets with strong seasonality or abrupt transients, while maintaining efficiency advantages over Transformer-based models. The work highlights the value of lossless, time-frequency localized decomposition and dual-path modeling for scalable, accurate LTSF, with potential extensions to learnable wavelets and self-supervised pre-training.

Abstract

Long-term time series forecasting (LTSF) is a critical task in computational intelligence. While Transformer-based models effectively capture long-range dependencies, they often suffer from quadratic complexity and overfitting due to data sparsity. Conversely, efficient linear models struggle to depict complex non-linear local dynamics. Furthermore, existing multi-scale frameworks typically rely on average pooling, which acts as a non-ideal low-pass filter, leading to spectral aliasing and the irreversible loss of high-frequency transients. In response, this paper proposes DPWMixer, a computationally efficient Dual-Path architecture. The framework is built upon a Lossless Haar Wavelet Pyramid that replaces traditional pooling, utilizing orthogonal decomposition to explicitly disentangle trends and local fluctuations without information loss. To process these components, we design a Dual-Path Trend Mixer that integrates a global linear mapping for macro-trend anchoring and a flexible patch-based MLP-Mixer for micro-dynamic evolution. Finally, An adaptive multi-scale fusion module then integrates predictions from diverse scales, weighted by channel stationarity to optimize synthesis. Extensive experiments on eight public benchmarks demonstrate that our method achieves a consistent improvement over state-of-the-art baselines. The code is available at https://github.com/hit636/DPWMixer.

Paper Structure

This paper contains 25 sections, 7 equations, 10 figures, 1 table, 1 algorithm.

Figures (10)

  • Figure 1: Illustration of the intrinsic challenges in LTSF and our design intuition. (a) Spectral Aliasing (Decomposition): Standard pooling leads to aliasing and information loss, whereas our wavelet-based approach achieves lossless, orthogonal disentanglement. (b) Trend-Detail Incompatibility (Modeling): Pure linear models (Orange dashed) capture the macro-trend but fail to model local transients. Our Dual-Path architecture (Blue solid) resolves this incompatibility by harmonizing global anchoring with local refinement.
  • Figure 2: The overall architecture of DPWMixer. Top: The global pipeline showing orthogonal multi-scale decomposition and adaptive fusion. (a) Haar Wavelet Pyramid: The input is orthogonally decomposed into Approximation ($\mathbf{X}$) and Detail ($\mathbf{H}$) coefficients, preventing aliasing compared to average pooling. (b) Dual-Path Trend Mixer: A hybrid block unifying a Global Trend Path for rigid trends and a Local Evolution Path for flexible dynamics. Outputs are fused via learnable gates.
  • Figure 3: Long-term forecasting performance comparison with an input length of $L=96$ for prediction horizons $T \in \{96, 192, 336, 720\}$. Best and second-best scores are marked in bold and with an underline, respectively. 'Avg' denotes the average performance.
  • Figure 4: (continued)
  • Figure 5: Visual comparison of forecasting performance on the Electricity dataset with a horizon of $T=192$. DPWMixer (a) accurately captures cyclic patterns and maintains trend consistency over the extended horizon, exhibiting lower error accumulation than the baseline method.
  • ...and 5 more figures