InjectTST: A Transformer Method of Injecting Global Information into Independent Channels for Long Time Series Forecasting

Ce Chi; Xing Wang; Kexin Yang; Zhiyan Song; Di Jin; Lin Zhu; Chao Deng; Junlan Feng

InjectTST: A Transformer Method of Injecting Global Information into Independent Channels for Long Time Series Forecasting

Ce Chi, Xing Wang, Kexin Yang, Zhiyan Song, Di Jin, Lin Zhu, Chao Deng, Junlan Feng

TL;DR

InjectTST tackles the gap between channel independence and channel mixing in multivariate time series forecasting by injecting selective cross-channel global information into independent Transformer channels. The method adds a channel identifier, two global mixing modules (CaT and PaT), and a self-contextual attention module to enable channel-specific, noise-robust integration of global context without explicit channel mixing. Empirical results on Weather, Electricity, Traffic, and ETTh/ETTm datasets demonstrate state-of-the-art performance and robustness across varying sequence lengths, with modest computational cost. This framework provides a principled route to combine the benefits of channel independence and channel mixing, with potential for further GI design and injection mechanisms.

Abstract

Transformer has become one of the most popular architectures for multivariate time series (MTS) forecasting. Recent Transformer-based MTS models generally prefer channel-independent structures with the observation that channel independence can alleviate noise and distribution drift issues, leading to more robustness. Nevertheless, it is essential to note that channel dependency remains an inherent characteristic of MTS, carrying valuable information. Designing a model that incorporates merits of both channel-independent and channel-mixing structures is a key to further improvement of MTS forecasting, which poses a challenging conundrum. To address the problem, an injection method for global information into channel-independent Transformer, InjectTST, is proposed in this paper. Instead of designing a channel-mixing model directly, we retain the channel-independent backbone and gradually inject global information into individual channels in a selective way. A channel identifier, a global mixing module and a self-contextual attention module are devised in InjectTST. The channel identifier can help Transformer distinguish channels for better representation. The global mixing module produces cross-channel global information. Through the self-contextual attention module, the independent channels can selectively concentrate on useful global information without robustness degradation, and channel mixing is achieved implicitly. Experiments indicate that InjectTST can achieve stable improvement compared with state-of-the-art models.

InjectTST: A Transformer Method of Injecting Global Information into Independent Channels for Long Time Series Forecasting

TL;DR

Abstract

Paper Structure (25 sections, 8 equations, 6 figures, 3 tables)

This paper contains 25 sections, 8 equations, 6 figures, 3 tables.

Introduction
Related Works
Methodology
Channel-Independent Backbone
Patching and Projection
Channel Identifier
Global Mixing Module
CaT Global Mixing Module
PaT Global Mixing Module
Self-Contextual Attention Module
Self-Supervised Training and Normalization
Experiments
Experimental Setup
Datasets
Baselines
...and 10 more sections

Figures (6)

Figure 1: Different types of MTS forecasting frameworks. The decoder can be replaced by a simple prediction head. (a) In a channel-independent framework, the prediction of a channel is irrelevant to other channels. The channels share the same model. (b) In a channel-mixing framework, the channels are mixed for a unified representation, and then the decoder produces the prediction for all channels at the same time. (c) In our proposed InjectTST framework, the channel-independent structure is used as a backbone. Each channel receives additional global information so as to achieve channel-mixing implicitly.
Figure 2: InjectTST architecture. In the channel-independent backbone, the patches of a channel are added with a positional encoding as well as a channel identifier (CID for short). In the global mixing module, the channels are mixed for the global information. Finally, in the self-contextual attention (SCA) module, the global information is injected into each channel via a cross attention design.
Figure 4: Forecasting performance of InjectTST and PatchTST with varing historical sequence lengths. The historical sequence lengths $L$ are set in {48, 96, 192, 336, 512, 720}. The prediction lengths $T$ are selected in {96, 720}.
Figure : (a) CaT global mixing module.
Figure : (a) CaT global mixing module.
...and 1 more figures

InjectTST: A Transformer Method of Injecting Global Information into Independent Channels for Long Time Series Forecasting

TL;DR

Abstract

InjectTST: A Transformer Method of Injecting Global Information into Independent Channels for Long Time Series Forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (6)