Table of Contents
Fetching ...

DisenTS: Disentangled Channel Evolving Pattern Modeling for Multivariate Time Series Forecasting

Zhiding Liu, Jiqian Yang, Qingyang Mao, Yuze Zhao, Mingyue Cheng, Zhi Li, Qi Liu, Enhong Chen

TL;DR

The central idea of DisenTS is to model the potential diverse patterns within the multivariate time series data in a decoupled manner, and introduces a novel Forecaster Aware Gate module that generates the routing signals adaptively according to both the forecasters' states and input series' characteristics.

Abstract

Multivariate time series forecasting plays a crucial role in various real-world applications. Significant efforts have been made to integrate advanced network architectures and training strategies that enhance the capture of temporal dependencies, thereby improving forecasting accuracy. On the other hand, mainstream approaches typically utilize a single unified model with simplistic channel-mixing embedding or cross-channel attention operations to account for the critical intricate inter-channel dependencies. Moreover, some methods even trade capacity for robust prediction based on the channel-independent assumption. Nonetheless, as time series data may display distinct evolving patterns due to the unique characteristics of each channel (including multiple strong seasonalities and trend changes), the unified modeling methods could yield suboptimal results. To this end, we propose DisenTS, a tailored framework for modeling disentangled channel evolving patterns in general multivariate time series forecasting. The central idea of DisenTS is to model the potential diverse patterns within the multivariate time series data in a decoupled manner. Technically, the framework employs multiple distinct forecasting models, each tasked with uncovering a unique evolving pattern. To guide the learning process without supervision of pattern partition, we introduce a novel Forecaster Aware Gate (FAG) module that generates the routing signals adaptively according to both the forecasters' states and input series' characteristics. The forecasters' states are derived from the Linear Weight Approximation (LWA) strategy, which quantizes the complex deep neural networks into compact matrices. Additionally, the Similarity Constraint (SC) is further proposed to guide each model to specialize in an underlying pattern by minimizing the mutual information between the representations.

DisenTS: Disentangled Channel Evolving Pattern Modeling for Multivariate Time Series Forecasting

TL;DR

The central idea of DisenTS is to model the potential diverse patterns within the multivariate time series data in a decoupled manner, and introduces a novel Forecaster Aware Gate module that generates the routing signals adaptively according to both the forecasters' states and input series' characteristics.

Abstract

Multivariate time series forecasting plays a crucial role in various real-world applications. Significant efforts have been made to integrate advanced network architectures and training strategies that enhance the capture of temporal dependencies, thereby improving forecasting accuracy. On the other hand, mainstream approaches typically utilize a single unified model with simplistic channel-mixing embedding or cross-channel attention operations to account for the critical intricate inter-channel dependencies. Moreover, some methods even trade capacity for robust prediction based on the channel-independent assumption. Nonetheless, as time series data may display distinct evolving patterns due to the unique characteristics of each channel (including multiple strong seasonalities and trend changes), the unified modeling methods could yield suboptimal results. To this end, we propose DisenTS, a tailored framework for modeling disentangled channel evolving patterns in general multivariate time series forecasting. The central idea of DisenTS is to model the potential diverse patterns within the multivariate time series data in a decoupled manner. Technically, the framework employs multiple distinct forecasting models, each tasked with uncovering a unique evolving pattern. To guide the learning process without supervision of pattern partition, we introduce a novel Forecaster Aware Gate (FAG) module that generates the routing signals adaptively according to both the forecasters' states and input series' characteristics. The forecasters' states are derived from the Linear Weight Approximation (LWA) strategy, which quantizes the complex deep neural networks into compact matrices. Additionally, the Similarity Constraint (SC) is further proposed to guide each model to specialize in an underlying pattern by minimizing the mutual information between the representations.

Paper Structure

This paper contains 24 sections, 10 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: (a)&(b) demonstrate the schema of the separated modeling and unified modeling. (c) provides the MSE comparison on each channel of the ETTh2 dataset with the above two schemas. The unified modeling approach drags down the performance on the Oil Temperature (OT) while promoting precision on other channels of external power load.
  • Figure 2: (a) During the forecasting stage, the framework utilizes multiple backbone models to uncover distinct evolving patterns, and the final results are predicted through a weighted sum approach. (b) The Forecaster Aware Gate (FAG) module combines the states of forecasting models and the characteristics of the input series to generate proper routing signals $\beta$ properly. (c) The Linear Weight Approximation (LWA) strategy quantifies each backbone model with a small matrix. (d) The Similarity Constraint is applied to the approximated matrices to ensure the disentanglement among the models, enforced through a pair-wise orthogonal penalty.
  • Figure 3: Averaged MSE evaluation on short-term multivariate time series forecasting comparing DisenTS with state-of-the-art channel-independent methods. The lookback length is set to 96 for all experimental settings.
  • Figure 4: Ablation study on the number of experts. We report the MAE evaluations of DisenTS with different forecasting models and $K$s.
  • Figure 5: The qualitative visualization of DisenTS-enhanced iTransformer in the Traffic dataset with a prediction length of 336. (a) visualizes the measured transformation matrix of 4 backbones and the corresponding approximation error $\epsilon$ in the first testing batch. (b) presents the forecasting results of all 4 backbones and the final prediction of DisenTS, on a sample of the same testing batch. (c) illustrates how the LWA of the first backbone model changes using EMA in the training procedure.
  • ...and 1 more figures