CARD: Channel Aligned Robust Blend Transformer for Time Series Forecasting
Wang Xue, Tian Zhou, Qingsong Wen, Jinyang Gao, Bolin Ding, Rong Jin
TL;DR
CARD introduces a channel-aligned robust blend Transformer for time-series forecasting to overcome channel-independent limitations by capturing cross-channel dependencies with a channel-aware attention mechanism and a token blend module for multi-scale representations. It integrates an EMA-smoothed token attention, dynamic channel projection to reduce cost, and a signal decay-based loss that emphasizes near-future forecasts while stabilizing training. Across seven long-horizon datasets and auxiliary tasks, CARD yields state-of-the-art accuracy, shows robustness to longer input sequences, and demonstrates strong anomaly detection and imputation capabilities. The work advances robust, scalable forecasting in high-dimensional, noisy time-series with practical implications for real-world systems.
Abstract
Recent studies have demonstrated the great power of Transformer models for time series forecasting. One of the key elements that lead to the transformer's success is the channel-independent (CI) strategy to improve the training robustness. However, the ignorance of the correlation among different channels in CI would limit the model's forecasting capacity. In this work, we design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting. First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals and dynamical dependence among multiple variables over time. Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions. Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue. This new loss function weights the importance of forecasting over a finite horizon based on prediction uncertainties. Our evaluation of multiple long-term and short-term forecasting datasets demonstrates that CARD significantly outperforms state-of-the-art time series forecasting methods. The code is available at the following repository:https://github.com/wxie9/CARD
