Table of Contents
Fetching ...

Enhancing Multivariate Time Series Forecasting with Mutual Information-driven Cross-Variable and Temporal Modeling

Shiyi Qi, Liangjian Wen, Yiduo Li, Yuanhang Yang, Zhe Li, Zhongwen Rao, Lujia Pan, Zenglin Xu

TL;DR

This work introduces the Cross-variable Decorrelation Aware feature Modeling (CDAM) for Channel-mixing approaches, aiming to refine Channel-mixing by minimizing redundant information between channels while enhancing relevant mutual information, and introduces the Temporal correlation Aware Modeling (TAM) to exploit temporal correlations.

Abstract

Recent advancements have underscored the impact of deep learning techniques on multivariate time series forecasting (MTSF). Generally, these techniques are bifurcated into two categories: Channel-independence and Channel-mixing approaches. Although Channel-independence methods typically yield better results, Channel-mixing could theoretically offer improvements by leveraging inter-variable correlations. Nonetheless, we argue that the integration of uncorrelated information in channel-mixing methods could curtail the potential enhancement in MTSF model performance. To substantiate this claim, we introduce the Cross-variable Decorrelation Aware feature Modeling (CDAM) for Channel-mixing approaches, aiming to refine Channel-mixing by minimizing redundant information between channels while enhancing relevant mutual information. Furthermore, we introduce the Temporal correlation Aware Modeling (TAM) to exploit temporal correlations, a step beyond conventional single-step forecasting methods. This strategy maximizes the mutual information between adjacent sub-sequences of both the forecasted and target series. Combining CDAM and TAM, our novel framework significantly surpasses existing models, including those previously considered state-of-the-art, in comprehensive tests.

Enhancing Multivariate Time Series Forecasting with Mutual Information-driven Cross-Variable and Temporal Modeling

TL;DR

This work introduces the Cross-variable Decorrelation Aware feature Modeling (CDAM) for Channel-mixing approaches, aiming to refine Channel-mixing by minimizing redundant information between channels while enhancing relevant mutual information, and introduces the Temporal correlation Aware Modeling (TAM) to exploit temporal correlations.

Abstract

Recent advancements have underscored the impact of deep learning techniques on multivariate time series forecasting (MTSF). Generally, these techniques are bifurcated into two categories: Channel-independence and Channel-mixing approaches. Although Channel-independence methods typically yield better results, Channel-mixing could theoretically offer improvements by leveraging inter-variable correlations. Nonetheless, we argue that the integration of uncorrelated information in channel-mixing methods could curtail the potential enhancement in MTSF model performance. To substantiate this claim, we introduce the Cross-variable Decorrelation Aware feature Modeling (CDAM) for Channel-mixing approaches, aiming to refine Channel-mixing by minimizing redundant information between channels while enhancing relevant mutual information. Furthermore, we introduce the Temporal correlation Aware Modeling (TAM) to exploit temporal correlations, a step beyond conventional single-step forecasting methods. This strategy maximizes the mutual information between adjacent sub-sequences of both the forecasted and target series. Combining CDAM and TAM, our novel framework significantly surpasses existing models, including those previously considered state-of-the-art, in comprehensive tests.
Paper Structure (23 sections, 22 equations, 5 figures, 9 tables)

This paper contains 23 sections, 22 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: (a) The framework of Channel-independence models and Channel-mixing models. Given historical series $X=\{X^i\}$ where $i$ denotes the channel index, the Channel-mixing model tends to maximize the mutual information between $X$ and the latent representation $Z^{i}$. Additionally, it maximize the mutual information between $Z^{i}$ and the i-th future series $Y^{i}$. The Channel-independence models maximize the mutual information between the i-th historical series $X^{i}$ and $Z^{i}$ while ignoring the mutual information between $Z^{i}$ and other channels; (b) Traffic flow of 5 adjacent detectors in the PEMS08 dataset; and (c) Prediction results of Channel-independence model (PatchTST), Channel-mixing model (Informer), and that with our framework, respectively.
  • Figure 2: Architecture of TAM with 4$\times$ dowmsampling. We downsample the target series and forecasted series utilizing single-forecaster into four sub-sequences, respectively. And then we maximize the mutual information between the adjacent sub-sequences of forecasted series and target series.
  • Figure 3: Evaluation on hyper-parameter $\beta$ and $\lambda$. We evaluate the impact of $\beta$ with Informer and Stationary on the ETTh1 dataset, we also evaluate $\lambda$ with PatchTST and RMLP on the ETTh1 dataset.
  • Figure 4: Test error for each training epoch. We train the baselines and integrated with our InfoTime for 50 epochs on ETTh1 with the history and prediction length are both 96.
  • Figure 5: Experimental results on synthetic data.