Table of Contents
Fetching ...

CARD: Channel Aligned Robust Blend Transformer for Time Series Forecasting

Wang Xue, Tian Zhou, Qingsong Wen, Jinyang Gao, Bolin Ding, Rong Jin

TL;DR

CARD introduces a channel-aligned robust blend Transformer for time-series forecasting to overcome channel-independent limitations by capturing cross-channel dependencies with a channel-aware attention mechanism and a token blend module for multi-scale representations. It integrates an EMA-smoothed token attention, dynamic channel projection to reduce cost, and a signal decay-based loss that emphasizes near-future forecasts while stabilizing training. Across seven long-horizon datasets and auxiliary tasks, CARD yields state-of-the-art accuracy, shows robustness to longer input sequences, and demonstrates strong anomaly detection and imputation capabilities. The work advances robust, scalable forecasting in high-dimensional, noisy time-series with practical implications for real-world systems.

Abstract

Recent studies have demonstrated the great power of Transformer models for time series forecasting. One of the key elements that lead to the transformer's success is the channel-independent (CI) strategy to improve the training robustness. However, the ignorance of the correlation among different channels in CI would limit the model's forecasting capacity. In this work, we design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting. First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals and dynamical dependence among multiple variables over time. Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions. Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue. This new loss function weights the importance of forecasting over a finite horizon based on prediction uncertainties. Our evaluation of multiple long-term and short-term forecasting datasets demonstrates that CARD significantly outperforms state-of-the-art time series forecasting methods. The code is available at the following repository:https://github.com/wxie9/CARD

CARD: Channel Aligned Robust Blend Transformer for Time Series Forecasting

TL;DR

CARD introduces a channel-aligned robust blend Transformer for time-series forecasting to overcome channel-independent limitations by capturing cross-channel dependencies with a channel-aware attention mechanism and a token blend module for multi-scale representations. It integrates an EMA-smoothed token attention, dynamic channel projection to reduce cost, and a signal decay-based loss that emphasizes near-future forecasts while stabilizing training. Across seven long-horizon datasets and auxiliary tasks, CARD yields state-of-the-art accuracy, shows robustness to longer input sequences, and demonstrates strong anomaly detection and imputation capabilities. The work advances robust, scalable forecasting in high-dimensional, noisy time-series with practical implications for real-world systems.

Abstract

Recent studies have demonstrated the great power of Transformer models for time series forecasting. One of the key elements that lead to the transformer's success is the channel-independent (CI) strategy to improve the training robustness. However, the ignorance of the correlation among different channels in CI would limit the model's forecasting capacity. In this work, we design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting. First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals and dynamical dependence among multiple variables over time. Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions. Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue. This new loss function weights the importance of forecasting over a finite horizon based on prediction uncertainties. Our evaluation of multiple long-term and short-term forecasting datasets demonstrates that CARD significantly outperforms state-of-the-art time series forecasting methods. The code is available at the following repository:https://github.com/wxie9/CARD
Paper Structure (49 sections, 15 equations, 40 figures, 30 tables, 2 algorithms)

This paper contains 49 sections, 15 equations, 40 figures, 30 tables, 2 algorithms.

Figures (40)

  • Figure 1: Illustration of the architecture of CARD.
  • Figure 2: Architecture for the CARD attention block.
  • Figure 3: Illustration example of token blend block in CARD.
  • Figure 4: Experiments on token blend size. The blend size is varying in $1$, $2$, $4$, $8$, and $16$.
  • Figure 5: Sample prediction graph for ETTh1 long-term forecasting task
  • ...and 35 more figures