Table of Contents
Fetching ...

CSformer: Combining Channel Independence and Mixing for Robust Multivariate Time Series Forecasting

Haoxin Wang, Yipeng Mo, Kunlan Xiang, Nan Yin, Honghe Dai, Bixiong Li, Songhai Fan, Site Mo

TL;DR

CSformer tackles multivariate time series forecasting by blending channel independence with channel mixing through a two-stage, shared-parameter attention mechanism and adapters. It uses a dimension-augmented embedding to expand sequence representation and applies channel- and sequence-MSA in a unified framework, enabling cross-dimension information fusion while maintaining efficiency. Empirical results across diverse datasets show state-of-the-art performance with strong generalization, supported by targeted ablations validating the two-stage MSA, adapters, and training strategy. This approach offers a practical, robust solution for real-world MTSF tasks and provides a new training paradigm emphasizing channel independence followed by mixing.

Abstract

In the domain of multivariate time series analysis, the concept of channel independence has been increasingly adopted, demonstrating excellent performance due to its ability to eliminate noise and the influence of irrelevant variables. However, such a concept often simplifies the complex interactions among channels, potentially leading to information loss. To address this challenge, we propose a strategy of channel independence followed by mixing. Based on this strategy, we introduce CSformer, a novel framework featuring a two-stage multiheaded self-attention mechanism. This mechanism is designed to extract and integrate both channel-specific and sequence-specific information. Distinctively, CSformer employs parameter sharing to enhance the cooperative effects between these two types of information. Moreover, our framework effectively incorporates sequence and channel adapters, significantly improving the model's ability to identify important information across various dimensions. Extensive experiments on several real-world datasets demonstrate that CSformer achieves state-of-the-art results in terms of overall performance.

CSformer: Combining Channel Independence and Mixing for Robust Multivariate Time Series Forecasting

TL;DR

CSformer tackles multivariate time series forecasting by blending channel independence with channel mixing through a two-stage, shared-parameter attention mechanism and adapters. It uses a dimension-augmented embedding to expand sequence representation and applies channel- and sequence-MSA in a unified framework, enabling cross-dimension information fusion while maintaining efficiency. Empirical results across diverse datasets show state-of-the-art performance with strong generalization, supported by targeted ablations validating the two-stage MSA, adapters, and training strategy. This approach offers a practical, robust solution for real-world MTSF tasks and provides a new training paradigm emphasizing channel independence followed by mixing.

Abstract

In the domain of multivariate time series analysis, the concept of channel independence has been increasingly adopted, demonstrating excellent performance due to its ability to eliminate noise and the influence of irrelevant variables. However, such a concept often simplifies the complex interactions among channels, potentially leading to information loss. To address this challenge, we propose a strategy of channel independence followed by mixing. Based on this strategy, we introduce CSformer, a novel framework featuring a two-stage multiheaded self-attention mechanism. This mechanism is designed to extract and integrate both channel-specific and sequence-specific information. Distinctively, CSformer employs parameter sharing to enhance the cooperative effects between these two types of information. Moreover, our framework effectively incorporates sequence and channel adapters, significantly improving the model's ability to identify important information across various dimensions. Extensive experiments on several real-world datasets demonstrate that CSformer achieves state-of-the-art results in terms of overall performance.
Paper Structure (37 sections, 6 equations, 9 figures, 12 tables)

This paper contains 37 sections, 6 equations, 9 figures, 12 tables.

Figures (9)

  • Figure 1: Transformer-based models categorized by channel independence and channel mixing.
  • Figure 2: Structural Comparison of Transformer (top left), iTransformer (top right), and proposed CSformer (bottom): In the illustration, we compare the architectures of Transformer (top left) and iTransformer (top right) with the proposed CSformer (bottom). While Transformer and iTransformer employ attention mechanisms separately in the sequence and channel dimensions, CSformer diverges by embedding sequences into a high-dimensional space. Consequently, CSformer performs attention independently in both channel and sequence dimensions.
  • Figure 3: We present the overall framework of CSformer (d). Initially, the input sequence undergoes a dimensional expansion operation before embedding (a). This dimensional transformation allows the standard MSA (b) to be adapted separately for channels and sequences (c). Note that Channel-MSA and Sequence-MSA share weights but are applied to different input dimensions.
  • Figure 4: Visualization of input-96-predict-96 results on the ETTm2 dataset.
  • Figure 5: A case visualization of score maps by two-stage attention.
  • ...and 4 more figures