Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations Modeling

Guoqi Yu; Jing Zou; Xiaowei Hu; Angelica I. Aviles-Rivero; Jing Qin; Shujun Wang

Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations Modeling

Guoqi Yu, Jing Zou, Xiaowei Hu, Angelica I. Aviles-Rivero, Jing Qin, Shujun Wang

TL;DR

The paper addresses multivariate time series forecasting by jointly modeling inter-series dependencies and intra-series variations. It introduces Leddam, combining a Learnable Decomposition that replaces untrainable moving averages with a Gaussian-initialized 1D convolution to split a series into $X_{Trend}$ and $X_{Seasonal}$, and a Dual Attention Module comprising Channel-wise self-attention and Auto-regressive self-attention within a Transformer backbone. The approach yields state-of-the-art results across eight real-world datasets and demonstrates that LD can boost other models by substantial margins, while also generalizing well to diverse architectures. The work suggests practical impact in improving forecast accuracy in domains with complex trend structures and inter-variable relationships.

Abstract

Predicting multivariate time series is crucial, demanding precise modeling of intricate patterns, including inter-series dependencies and intra-series variations. Distinctive trend characteristics in each time series pose challenges, and existing methods, relying on basic moving average kernels, may struggle with the non-linear structure and complex trends in real-world data. Given that, we introduce a learnable decomposition strategy to capture dynamic trend information more reasonably. Additionally, we propose a dual attention module tailored to capture inter-series dependencies and intra-series variations simultaneously for better time series forecasting, which is implemented by channel-wise self-attention and autoregressive self-attention. To evaluate the effectiveness of our method, we conducted experiments across eight open-source datasets and compared it with the state-of-the-art methods. Through the comparison results, our Leddam (LEarnable Decomposition and Dual Attention Module) not only demonstrates significant advancements in predictive performance, but also the proposed decomposition strategy can be plugged into other methods with a large performance-boosting, from 11.87% to 48.56% MSE error degradation.

Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations Modeling

TL;DR

and

, and a Dual Attention Module comprising Channel-wise self-attention and Auto-regressive self-attention within a Transformer backbone. The approach yields state-of-the-art results across eight real-world datasets and demonstrates that LD can boost other models by substantial margins, while also generalizing well to diverse architectures. The work suggests practical impact in improving forecast accuracy in domains with complex trend structures and inter-variable relationships.

Abstract

Paper Structure (26 sections, 8 equations, 19 figures, 15 tables)

This paper contains 26 sections, 8 equations, 19 figures, 15 tables.

Introduction
Related work
Methodology
Problem Definition
Learnable Decomposition Module
Dual Attention Module
Experiments
Experimental Settings
Experiments Results
Model Analysis
Learnable Decomposition Generalization Analysis
Conclusion
Experimental Details
Dataset Statistics
Implementation Details and Model Parameters
...and 11 more sections

Figures (19)

Figure 1: (a) Demonstration of inter-series dependencies and intra-series variations. (b) Visualization of different decomposition schemes in Electricity data. RAW means the raw time series. MOV means moving average kernel, and LD means our learnable decomposition module.
Figure 2: Overall structure of proposed Leddam. We start by embedding the time series and incorporating positional encoding. Then, the time series is decomposed into its trend and seasonal parts, each addressed through distinct methodologies. Finally, the processed outcomes of these two components are aggregated to obtain the ultimate predictive result.
Figure 3: General process of 'Dual Attention Module' to deal with Inter-series dependencies and Intra-series variations, respectively. 'Channel-wise self-attention' embeds the whole series of a channel to generate 'Whole Series Embedding', and transformer encoders are employed to model Inter-series dependencies. 'Auto-regressive self-attention' generates 'Auto-regressive Embedding' and still utilizes transformer encoders to model Intra-series variations
Figure 4: Predictive performance comparison(MSE) of 'Channel-wise self-attention', 'Auto-regressive self-attention', 'Patch-wise self-attention', and 'Point-wise self-attention' across ETTh2, ETTm2 and Traffic datasets. The prediction horizon is uniformly set at $F = 96$, while the input length $T = 96$.
Figure 5: Trend-Seasonal Decomposition Results obtained by LD (Red) and MOV (Blue) on ETTh1.
...and 14 more figures

Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations Modeling

TL;DR

Abstract

Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations Modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (19)