The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for Multivariate Time Series Forecasting

Lu Han; Han-Jia Ye; De-Chuan Zhan

The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for Multivariate Time Series Forecasting

Lu Han, Han-Jia Ye, De-Chuan Zhan

TL;DR

The paper investigates why Channel Independent (CI) training often beats Channel Dependent (CD) in multivariate time series forecasting. Through extensive experiments across nine real-world datasets and multiple algorithms, CI typically yields lower MAE/MSE and less variance, challenging the assumption that modeling channel correlations via CD is always beneficial. The authors provide theoretical and empirical analysis on a linear model showing CI trades capacity for robustness to distribution drift, manifested in channel-wise ACF differences, and propose Predict Residuals with Regularization (PRReg) to boost CD performance. They offer practical guidelines and discuss factors that influence CI/CD performance, aiming to inform the design of more robust MTS forecasting approaches.

Abstract

Multivariate time series data comprises various channels of variables. The multivariate forecasting models need to capture the relationship between the channels to accurately predict future values. However, recently, there has been an emergence of methods that employ the Channel Independent (CI) strategy. These methods view multivariate time series data as separate univariate time series and disregard the correlation between channels. Surprisingly, our empirical results have shown that models trained with the CI strategy outperform those trained with the Channel Dependent (CD) strategy, usually by a significant margin. Nevertheless, the reasons behind this phenomenon have not yet been thoroughly explored in the literature. This paper provides comprehensive empirical and theoretical analyses of the characteristics of multivariate time series datasets and the CI/CD strategy. Our results conclude that the CD approach has higher capacity but often lacks robustness to accurately predict distributionally drifted time series. In contrast, the CI approach trades capacity for robust prediction. Practical measures inspired by these analyses are proposed to address the capacity and robustness dilemma, including a modified CD method called Predict Residuals with Regularization (PRReg) that can surpass the CI strategy. We hope our findings can raise awareness among researchers about the characteristics of multivariate time series and inspire the construction of better forecasting models.

The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for Multivariate Time Series Forecasting

TL;DR

Abstract

Paper Structure (21 sections, 2 theorems, 19 equations, 10 figures, 5 tables)

This paper contains 21 sections, 2 theorems, 19 equations, 10 figures, 5 tables.

Introduction
Preliminaries
Multivariate Time Series Forecasting
Channel Dependent (CD) Strategy
Channel Independent (CI) Strategy
Empirical Comparison of CD and CI
Experiment Setup
Datasets.
Evaluation metrics.
Compared methods.
Other details.
Main Results
Analysis
Distribution Drift
CI Alleviates Distribution Drift
...and 6 more sections

Key Result

Proposition 4.2

Assuming a long-term AR model on time series ${\boldsymbol{x}}$ with look-back window (order) $L$ and horizon $H$ is defined as: where ${\boldsymbol{W}} \in \mathbb{R}^{H\times L}$ is the coefficients of the model. Then the best estimation ${\boldsymbol{W}}^*$ can be computed by extended version of Yule-Walker equation udny1927methodwalker1931periodicity: where $\rho(\tau) = \rho(-\tau)$ is the

Figures (10)

Figure 1: Comparison of two training strategies for Multivariate Time Series Forecasting (MTSF) tasks. The left shows the Channel Dependent (CD) strategy where all the channels are taken as input and forecasted future values depend on the history of all the channels. The right shows the Channel Independent (CI) strategy, which treats the multivariate series as multiple univariate series and trains a unified model on these series. The prediction of each channel depends solely on its own historical values, and the relationship between different channels is ignored.
Figure 2: The performance distribution of 7 models utilizing the CI and CD strategy. Values come from \ref{['tb:mae_analysis']} and \ref{['tb:mse_analysis']}. The prediction length is 24 for ILI dataset and 48 for the others. In most cases, CI has a lower error mean and a smaller variance than CD strategy. It means that CI performs better than CD. Also, when using CI strategy, the model performance does not differ very much.
Figure 3: The ACF of train series and test series. Captions of each subfigure represent the tuple (channel, dataset). For each subfigure, the leftmost plot displays the series split, with the training series in black, validation in blue, and test in purple. The middle and right display the ACF of train and test series respectively. The middle and right plots show the ACF of the training and test series, respectively. The results reveal a significant discrepancy in the statistics between the training and test series.
Figure 4: The difference of ACF between training data and test data. The ACF difference for each channel is depicted in bar charts, arranged in descending order. The sum diff, which represents the overall ACF difference under the CI strategy, is shown as a horizontal line. the sum diff is smaller than the ACF difference of each channel, indicating that the CI strategy can effectively mitigate distribution drift.
Figure 5: The train error, test error, W diff and gen error when using CI and CD strategy on the 9 datasets. Train/test error measures model capacity on train/test data. W diff measures the difference between the optimal model on train and test data. It reveals the robustness of a model. Gen error measures the risk of an algorithm. Although CD can achieve lower optimal error, it is much less robust to the distribution drift than CI. Consequently, in most cases, CI outperforms CD.
...and 5 more figures

Theorems & Definitions (4)

Definition 4.1: AutoCorrelation Function (ACF) madsen2007time
Proposition 4.2: Yule-Walker equation udny1927methodwalker1931periodicity extended
Definition 4.3: Objective of Linear (CD) and Linear (CI)
Proposition 4.4: Yule-Walker equation of Linear (CD) and Linear (CI)

The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for Multivariate Time Series Forecasting

TL;DR

Abstract

The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for Multivariate Time Series Forecasting

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (4)