From Similarity to Superiority: Channel Clustering for Time Series Forecasting

Jialin Chen; Jan Eric Lenssen; Aosong Feng; Weihua Hu; Matthias Fey; Leandros Tassiulas; Jure Leskovec; Rex Ying

From Similarity to Superiority: Channel Clustering for Time Series Forecasting

Jialin Chen, Jan Eric Lenssen, Aosong Feng, Weihua Hu, Matthias Fey, Leandros Tassiulas, Jure Leskovec, Rex Ying

TL;DR

A novel and adaptable Channel Clustering Module (CCM), which dynamically groups channels characterized by intrinsic similarities and leverages cluster information instead of individual channel identities, combining the best of CD and CI worlds.

Abstract

Time series forecasting has attracted significant attention in recent decades. Previous studies have demonstrated that the Channel-Independent (CI) strategy improves forecasting performance by treating different channels individually, while it leads to poor generalization on unseen instances and ignores potentially necessary interactions between channels. Conversely, the Channel-Dependent (CD) strategy mixes all channels with even irrelevant and indiscriminate information, which, however, results in oversmoothing issues and limits forecasting accuracy. There is a lack of channel strategy that effectively balances individual channel treatment for improved forecasting performance without overlooking essential interactions between channels. Motivated by our observation of a correlation between the time series model's performance boost against channel mixing and the intrinsic similarity on a pair of channels, we developed a novel and adaptable Channel Clustering Module (CCM). CCM dynamically groups channels characterized by intrinsic similarities and leverages cluster information instead of individual channel identities, combining the best of CD and CI worlds. Extensive experiments on real-world datasets demonstrate that CCM can (1) boost the performance of CI and CD models by an average margin of 2.4% and 7.2% on long-term and short-term forecasting, respectively; (2) enable zero-shot forecasting with mainstream time series forecasting models; (3) uncover intrinsic time series patterns among channels and improve interpretability of complex time series models.

From Similarity to Superiority: Channel Clustering for Time Series Forecasting

TL;DR

Abstract

Paper Structure (36 sections, 6 equations, 8 figures, 19 tables)

This paper contains 36 sections, 6 equations, 8 figures, 19 tables.

Introduction
Related Work
Time Series Forecasting Models
Channel Strategies in Time Series Forecasting
Preliminaries
Proposed Method
Motivation for Channel Similarity
CCM: Channel Clustering Module
Complexity Analysis
Experiments
Experimental Setup
Long-term Forecasting Results
Short-term Forecasting Results
Zero-shot Forecasting Results
Qualitative Visualization
...and 21 more sections

Figures (8)

Figure 1: The pipeline of applying Channel Clustering Module (CCM) to general time series models. (a) is the general framework of most time series models. (b) illustrates two modified modules when applying CCM: Cluster Assigner and Cluster-aware Feed Forward. Cluster Assigner learns channel clustering based on intrinsic similarities and creates prototype embeddings for each cluster via a cross-attention mechanism. The clustering probabilities $\{p_{i,k}\}$ are subsequently used in Cluster-aware Feed Forward to average $\{\mathbf{\theta}_k\}_{k=1}^K$, which are layer weights assigned to $K$ clusters, obtaining weights $\mathbf{\theta}^{i}$ for the $i$-th channel. The learned prototypes retain pre-trained knowledge, enabling zero-shot forecasting on unseen samples in both univariate and multivariate scenarios.
Figure 2: t-SNE visualization of channel and prototype embedding by DLinear with CCM on (a) ETTh1 and (b) ETTh2 dataset. The lower left corner shows the similarity matrix between channels.
Figure 4: Ablation Study on Cluster Ratios in terms of MSE loss with four base models. The forecasting horizon is 96. (left: ETTh1 dataset; right: ETTm1 dataset)
Figure 5: Efficiency analysis in model size and running time on ETTh1 dataset.
Figure 6: (a) Channel-wise forecasting performance and (b) Channel similarity on ETTh1 dataset illustrate the correlation between model performance and intrinsic similarity
...and 3 more figures

From Similarity to Superiority: Channel Clustering for Time Series Forecasting

TL;DR

Abstract

From Similarity to Superiority: Channel Clustering for Time Series Forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (8)