sTransformer: A Modular Approach for Extracting Inter-Sequential and Temporal Information for Time-Series Forecasting
Jiaheng Yin, Zhengxin Shi, Jianshen Zhang, Xiaomin Lin, Yulin Huang, Yongzhi Qi, Wei Qi
TL;DR
The paper addresses the challenge that Transformer-based time-series forecasting often underperforms simple linear models on long horizons and lacks scalable inter-sequence modeling. It introduces sTransformer, which combines a Sequence and Temporal Convolutional Network (STCN) and a Sequence-guided Mask Attention (SeqMask) within the Transformer to capture both temporal/inter-sequence information and global feature interactions. Across five public multivariate datasets for long-term forecasting, sTransformer achieves state-of-the-art results, outperforming linear predictors and prior SOTA methods, and it also demonstrates strong performance on short-term forecasting and anomaly detection. The results validate the modular, scalable design and suggest a solid baseline for time-series tasks that can generalize across tasks.
Abstract
In recent years, numerous Transformer-based models have been applied to long-term time-series forecasting (LTSF) tasks. However, recent studies with linear models have questioned their effectiveness, demonstrating that simple linear layers can outperform sophisticated Transformer-based models. In this work, we review and categorize existing Transformer-based models into two main types: (1) modifications to the model structure and (2) modifications to the input data. The former offers scalability but falls short in capturing inter-sequential information, while the latter preprocesses time-series data but is challenging to use as a scalable module. We propose $\textbf{sTransformer}$, which introduces the Sequence and Temporal Convolutional Network (STCN) to fully capture both sequential and temporal information. Additionally, we introduce a Sequence-guided Mask Attention mechanism to capture global feature information. Our approach ensures the capture of inter-sequential information while maintaining module scalability. We compare our model with linear models and existing forecasting models on long-term time-series forecasting, achieving new state-of-the-art results. We also conducted experiments on other time-series tasks, achieving strong performance. These demonstrate that Transformer-based structures remain effective and our model can serve as a viable baseline for time-series tasks.
