PSformer: Parameter-efficient Transformer with Segment Attention for Time Series Forecasting
Yanlong Wang, Jian Xu, Fei Ma, Shao-Lun Huang, Danny Dongning Sun, Xiao-Ping Zhang
TL;DR
PSformer tackles multivariate time series forecasting by marrying parameter sharing with a novel Spatial-Temporal Segment Attention mechanism. By partitioning the input into cross-channel segments and applying two-stage SegAtt with a shared PS Block, it captures local spatio-temporal dependencies while maintaining a compact parameter footprint. Across eight long-term forecasting datasets, PSformer achieves state-of-the-art results on the majority of tasks, illustrating improved scalability and robustness versus baselines. The approach offers practical benefits for high-dimensional forecasting where model size and training efficiency are critical, and opens pathways for applying PS and SegAtt in pre-trained time-series models.
Abstract
Time series forecasting remains a critical challenge across various domains, often complicated by high-dimensional data and long-term dependencies. This paper presents a novel transformer architecture for time series forecasting, incorporating two key innovations: parameter sharing (PS) and Spatial-Temporal Segment Attention (SegAtt). We also define the time series segment as the concatenation of sequence patches from the same positions across different variables. The proposed model, PSformer, reduces the number of training parameters through the parameter sharing mechanism, thereby improving model efficiency and scalability. The introduction of SegAtt could enhance the capability of capturing local spatio-temporal dependencies by computing attention over the segments, and improve global representation by integrating information across segments. The combination of parameter sharing and SegAtt significantly improves the forecasting performance. Extensive experiments on benchmark datasets demonstrate that PSformer outperforms popular baselines and other transformer-based approaches in terms of accuracy and scalability, establishing itself as an accurate and scalable tool for time series forecasting.
