InjectTST: A Transformer Method of Injecting Global Information into Independent Channels for Long Time Series Forecasting
Ce Chi, Xing Wang, Kexin Yang, Zhiyan Song, Di Jin, Lin Zhu, Chao Deng, Junlan Feng
TL;DR
InjectTST tackles the gap between channel independence and channel mixing in multivariate time series forecasting by injecting selective cross-channel global information into independent Transformer channels. The method adds a channel identifier, two global mixing modules (CaT and PaT), and a self-contextual attention module to enable channel-specific, noise-robust integration of global context without explicit channel mixing. Empirical results on Weather, Electricity, Traffic, and ETTh/ETTm datasets demonstrate state-of-the-art performance and robustness across varying sequence lengths, with modest computational cost. This framework provides a principled route to combine the benefits of channel independence and channel mixing, with potential for further GI design and injection mechanisms.
Abstract
Transformer has become one of the most popular architectures for multivariate time series (MTS) forecasting. Recent Transformer-based MTS models generally prefer channel-independent structures with the observation that channel independence can alleviate noise and distribution drift issues, leading to more robustness. Nevertheless, it is essential to note that channel dependency remains an inherent characteristic of MTS, carrying valuable information. Designing a model that incorporates merits of both channel-independent and channel-mixing structures is a key to further improvement of MTS forecasting, which poses a challenging conundrum. To address the problem, an injection method for global information into channel-independent Transformer, InjectTST, is proposed in this paper. Instead of designing a channel-mixing model directly, we retain the channel-independent backbone and gradually inject global information into individual channels in a selective way. A channel identifier, a global mixing module and a self-contextual attention module are devised in InjectTST. The channel identifier can help Transformer distinguish channels for better representation. The global mixing module produces cross-channel global information. Through the self-contextual attention module, the independent channels can selectively concentrate on useful global information without robustness degradation, and channel mixing is achieved implicitly. Experiments indicate that InjectTST can achieve stable improvement compared with state-of-the-art models.
