Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations Modeling
Guoqi Yu, Jing Zou, Xiaowei Hu, Angelica I. Aviles-Rivero, Jing Qin, Shujun Wang
TL;DR
The paper addresses multivariate time series forecasting by jointly modeling inter-series dependencies and intra-series variations. It introduces Leddam, combining a Learnable Decomposition that replaces untrainable moving averages with a Gaussian-initialized 1D convolution to split a series into $X_{Trend}$ and $X_{Seasonal}$, and a Dual Attention Module comprising Channel-wise self-attention and Auto-regressive self-attention within a Transformer backbone. The approach yields state-of-the-art results across eight real-world datasets and demonstrates that LD can boost other models by substantial margins, while also generalizing well to diverse architectures. The work suggests practical impact in improving forecast accuracy in domains with complex trend structures and inter-variable relationships.
Abstract
Predicting multivariate time series is crucial, demanding precise modeling of intricate patterns, including inter-series dependencies and intra-series variations. Distinctive trend characteristics in each time series pose challenges, and existing methods, relying on basic moving average kernels, may struggle with the non-linear structure and complex trends in real-world data. Given that, we introduce a learnable decomposition strategy to capture dynamic trend information more reasonably. Additionally, we propose a dual attention module tailored to capture inter-series dependencies and intra-series variations simultaneously for better time series forecasting, which is implemented by channel-wise self-attention and autoregressive self-attention. To evaluate the effectiveness of our method, we conducted experiments across eight open-source datasets and compared it with the state-of-the-art methods. Through the comparison results, our Leddam (LEarnable Decomposition and Dual Attention Module) not only demonstrates significant advancements in predictive performance, but also the proposed decomposition strategy can be plugged into other methods with a large performance-boosting, from 11.87% to 48.56% MSE error degradation.
