Table of Contents
Fetching ...

Ister: Inverted Seasonal-Trend Decomposition Transformer for Explainable Multivariate Time Series Forecasting

Fanpu Cao, Shu Yang, Zhengjian Chen, Ye Liu, Laizhong Cui

TL;DR

Ister tackles long-horizon multivariate forecasting by pairing a hierarchical seasonal-trend decomposition with a Dual Transformer backbone and a novel Dot-attention mechanism. The model first decomposes time series into seasonal and trend parts, further splitting seasonal signals into multiple periodic components, and then jointly models inter-series dependencies and multi-periodicity to improve predictive accuracy. Dot-attention provides linear-time channel weighting and tangible interpretability of component contributions, while enabling efficient inference. Experiments on six real-world datasets show state-of-the-art results, with up to 10% improvements in MSE, and visualizable insights into which components drive predictions, making Ister practical for real-world deployment and analysis.

Abstract

In long-term time series forecasting, Transformer-based models have achieved great success, due to its ability to capture long-range dependencies. However, existing models face challenges in identifying critical components for prediction, leading to limited interpretability and suboptimal performance. To address these issues, we propose the Inverted Seasonal-Trend Decomposition Transformer (Ister), a novel Transformer-based model for multivariate time series forecasting. Ister decomposes time series into seasonal and trend components, further modeling multi-periodicity and inter-series dependencies using a Dual Transformer architecture. We introduce a novel Dot-attention mechanism that improves interpretability, computational efficiency, and predictive accuracy. Comprehensive experiments on benchmark datasets demonstrate that Ister outperforms existing state-of-the-art models, achieving up to 10% improvement in MSE. Moreover, Ister enables intuitive visualization of component contributions, shedding lights on model's decision process and enhancing transparency in prediction results.

Ister: Inverted Seasonal-Trend Decomposition Transformer for Explainable Multivariate Time Series Forecasting

TL;DR

Ister tackles long-horizon multivariate forecasting by pairing a hierarchical seasonal-trend decomposition with a Dual Transformer backbone and a novel Dot-attention mechanism. The model first decomposes time series into seasonal and trend parts, further splitting seasonal signals into multiple periodic components, and then jointly models inter-series dependencies and multi-periodicity to improve predictive accuracy. Dot-attention provides linear-time channel weighting and tangible interpretability of component contributions, while enabling efficient inference. Experiments on six real-world datasets show state-of-the-art results, with up to 10% improvements in MSE, and visualizable insights into which components drive predictions, making Ister practical for real-world deployment and analysis.

Abstract

In long-term time series forecasting, Transformer-based models have achieved great success, due to its ability to capture long-range dependencies. However, existing models face challenges in identifying critical components for prediction, leading to limited interpretability and suboptimal performance. To address these issues, we propose the Inverted Seasonal-Trend Decomposition Transformer (Ister), a novel Transformer-based model for multivariate time series forecasting. Ister decomposes time series into seasonal and trend components, further modeling multi-periodicity and inter-series dependencies using a Dual Transformer architecture. We introduce a novel Dot-attention mechanism that improves interpretability, computational efficiency, and predictive accuracy. Comprehensive experiments on benchmark datasets demonstrate that Ister outperforms existing state-of-the-art models, achieving up to 10% improvement in MSE. Moreover, Ister enables intuitive visualization of component contributions, shedding lights on model's decision process and enhancing transparency in prediction results.

Paper Structure

This paper contains 41 sections, 2 theorems, 13 equations, 8 figures, 8 tables, 2 algorithms.

Key Result

Theorem 1

Let $X = \{x_1, x_2, \dots, x_N\}$ be a set of elements where $x_i \in \mathbb{R}^d$, and let $f: 2^{\mathbb{R}^d} \to \mathbb{R}^m$ be a function operating on the set $X$. If $f$ is permutation-invariant to the input set $X$, then $f$ can be decomposed as: where $\phi: \mathbb{R}^d \to \mathbb{R}^m$ is a learned function that maps each element of the set to an intermediate representation and $\r

Figures (8)

  • Figure 1: Overall structure of Ister. The pipeline of Ister consists of several key stages: data preprocessing, embedding, backbone, and final output. Upon completion of training phase, in addition to generating predictions for future sequences, Ister provides users with the ability to examine the contribution of each component to the final prediction, presented in the form of a probability distribution.
  • Figure 2: Attention heatmap visualization results of a 3-layer iTransformer after 10 epochs of training on ECL and traffic dataset
  • Figure 3: We used a 3-layer Ister trained on ECL and Traffic for 10 epochs and demonstrated the importance of each channel learned by Dot-attention to the overall prediction on the corresponding test set.
  • Figure 4: The importance of each period component learned by Dot-attention to the overall prediction. The labels on the x-axis indicate the different periodic components. For example: 24(1) indicates the 1st component obtained by splitting the sequence with a period of 24, and so on. The numbers in red represent the index range of the timing contained in a number of period components.
  • Figure 5: Forecasting performance with the look-back length varying from {48, 96, 192, 336, 720} and prediction length varying from {96, 192, 336, 720}. Different styles of lines represent different prediction lengths. Ister's forecasting performance benefits from the increase of look-back length.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Theorem 1: DeepSets
  • Lemma 1: Proof is given in Appendix
  • Definition 1
  • proof