Table of Contents
Fetching ...

UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting

Juncheng Liu, Chenghao Liu, Gerald Woo, Yiwei Wang, Bryan Hooi, Caiming Xiong, Doyen Sahoo

TL;DR

UniTST addresses the challenge of simultaneously modeling inter-series and intra-series dependencies in multivariate time series forecasting. It introduces a unified attention mechanism that flattens patch tokens from all variates into a single sequence, coupled with a lightweight dispatcher to reduce memory complexity, enabling scalable learning of cross-time cross-variate patterns. Across 13 real-world datasets, UniTST achieves state-of-the-art results on both long- and short-term forecasting, demonstrating the value of explicit cross-variate temporal modeling. The work highlights practical benefits for forecasting in complex multivariate settings and suggests avenues for handling very long sequences in future research.

Abstract

Transformer-based models have emerged as powerful tools for multivariate time series forecasting (MTSF). However, existing Transformer models often fall short of capturing both intricate dependencies across variate and temporal dimensions in MTS data. Some recent models are proposed to separately capture variate and temporal dependencies through either two sequential or parallel attention mechanisms. However, these methods cannot directly and explicitly learn the intricate inter-series and intra-series dependencies. In this work, we first demonstrate that these dependencies are very important as they usually exist in real-world data. To directly model these dependencies, we propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens. Additionally, we add a dispatcher module which reduces the complexity and makes the model feasible for a potentially large number of variates. Although our proposed model employs a simple architecture, it offers compelling performance as shown in our extensive experiments on several datasets for time series forecasting.

UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting

TL;DR

UniTST addresses the challenge of simultaneously modeling inter-series and intra-series dependencies in multivariate time series forecasting. It introduces a unified attention mechanism that flattens patch tokens from all variates into a single sequence, coupled with a lightweight dispatcher to reduce memory complexity, enabling scalable learning of cross-time cross-variate patterns. Across 13 real-world datasets, UniTST achieves state-of-the-art results on both long- and short-term forecasting, demonstrating the value of explicit cross-variate temporal modeling. The work highlights practical benefits for forecasting in complex multivariate settings and suggests avenues for handling very long sequences in future research.

Abstract

Transformer-based models have emerged as powerful tools for multivariate time series forecasting (MTSF). However, existing Transformer models often fall short of capturing both intricate dependencies across variate and temporal dimensions in MTS data. Some recent models are proposed to separately capture variate and temporal dependencies through either two sequential or parallel attention mechanisms. However, these methods cannot directly and explicitly learn the intricate inter-series and intra-series dependencies. In this work, we first demonstrate that these dependencies are very important as they usually exist in real-world data. To directly model these dependencies, we propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens. Additionally, we add a dispatcher module which reduces the complexity and makes the model feasible for a potentially large number of variates. Although our proposed model employs a simple architecture, it offers compelling performance as shown in our extensive experiments on several datasets for time series forecasting.
Paper Structure (28 sections, 5 equations, 9 figures, 6 tables)

This paper contains 28 sections, 5 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Comparison between our model and previous models. Previous models apply time-wise attention and variate-wise attention modules either sequentially or parallelly, which cannot capture cross-time cross-variate dependencies (i.e., green links) simultaneously like our model.
  • Figure 2: Explicit correlation between two sub-series at different periods from two different variates (i.e., strong correlation between period 1 of variate 1 and period 2 and variate 2).
  • Figure 3: Correlation between patches from different variates. x-axis: patch indices in variate 10, y-axis: patch indices in variate 0.
  • Figure 4: Framework Overview. We flatten the patches from all variates into a sequence as the input of the Transformer Encoder and replace the original self-attention with the proposed unified attention with dispatchers to reduce the memory complexity.
  • Figure 5: Performance with different lookback lengths and fixed prediction length $S = 96$.
  • ...and 4 more figures

Theorems & Definitions (1)

  • Definition 1: Cross-Time Cross-Variate Correlation Coefficient