Table of Contents
Fetching ...

PSformer: Parameter-efficient Transformer with Segment Attention for Time Series Forecasting

Yanlong Wang, Jian Xu, Fei Ma, Shao-Lun Huang, Danny Dongning Sun, Xiao-Ping Zhang

TL;DR

PSformer tackles multivariate time series forecasting by marrying parameter sharing with a novel Spatial-Temporal Segment Attention mechanism. By partitioning the input into cross-channel segments and applying two-stage SegAtt with a shared PS Block, it captures local spatio-temporal dependencies while maintaining a compact parameter footprint. Across eight long-term forecasting datasets, PSformer achieves state-of-the-art results on the majority of tasks, illustrating improved scalability and robustness versus baselines. The approach offers practical benefits for high-dimensional forecasting where model size and training efficiency are critical, and opens pathways for applying PS and SegAtt in pre-trained time-series models.

Abstract

Time series forecasting remains a critical challenge across various domains, often complicated by high-dimensional data and long-term dependencies. This paper presents a novel transformer architecture for time series forecasting, incorporating two key innovations: parameter sharing (PS) and Spatial-Temporal Segment Attention (SegAtt). We also define the time series segment as the concatenation of sequence patches from the same positions across different variables. The proposed model, PSformer, reduces the number of training parameters through the parameter sharing mechanism, thereby improving model efficiency and scalability. The introduction of SegAtt could enhance the capability of capturing local spatio-temporal dependencies by computing attention over the segments, and improve global representation by integrating information across segments. The combination of parameter sharing and SegAtt significantly improves the forecasting performance. Extensive experiments on benchmark datasets demonstrate that PSformer outperforms popular baselines and other transformer-based approaches in terms of accuracy and scalability, establishing itself as an accurate and scalable tool for time series forecasting.

PSformer: Parameter-efficient Transformer with Segment Attention for Time Series Forecasting

TL;DR

PSformer tackles multivariate time series forecasting by marrying parameter sharing with a novel Spatial-Temporal Segment Attention mechanism. By partitioning the input into cross-channel segments and applying two-stage SegAtt with a shared PS Block, it captures local spatio-temporal dependencies while maintaining a compact parameter footprint. Across eight long-term forecasting datasets, PSformer achieves state-of-the-art results on the majority of tasks, illustrating improved scalability and robustness versus baselines. The approach offers practical benefits for high-dimensional forecasting where model size and training efficiency are critical, and opens pathways for applying PS and SegAtt in pre-trained time-series models.

Abstract

Time series forecasting remains a critical challenge across various domains, often complicated by high-dimensional data and long-term dependencies. This paper presents a novel transformer architecture for time series forecasting, incorporating two key innovations: parameter sharing (PS) and Spatial-Temporal Segment Attention (SegAtt). We also define the time series segment as the concatenation of sequence patches from the same positions across different variables. The proposed model, PSformer, reduces the number of training parameters through the parameter sharing mechanism, thereby improving model efficiency and scalability. The introduction of SegAtt could enhance the capability of capturing local spatio-temporal dependencies by computing attention over the segments, and improve global representation by integrating information across segments. The combination of parameter sharing and SegAtt significantly improves the forecasting performance. Extensive experiments on benchmark datasets demonstrate that PSformer outperforms popular baselines and other transformer-based approaches in terms of accuracy and scalability, establishing itself as an accurate and scalable tool for time series forecasting.

Paper Structure

This paper contains 38 sections, 6 equations, 15 figures, 19 tables.

Figures (15)

  • Figure 1: PSformer Dataflow Pipline. Multivariate time series data first undergo patching and cross-channel merging before being fed into the PSformer Encoder. During the cross-channel patching stage, patches from the same position across different time series variables are concatenated to form a mixed local multivariate sequence, where the different mixed time series maintain a strict chronological order, capturing global spatiotemporal dynamics. Finally, inverse transformation and linear mapping are applied to generate the prediction.
  • Figure 2: PSformer Network Structure. The PSformer processes input through a two-stage Segment Attention connected via residual structures. In the final fusion stage, it passes through a PS Block to generate the output. Notably, the PS Block used here is identical to those in the two-stage Segment Attention. Within the Segment Attention modules, the outputs of the PS Block are leveraged as Q, K, V matrices to compute cross-channel dot-product attention over temporal sequences. This attention matrix captures local spatiotemporal interactions across channels. The right portion of the figure illustrates this cross-channel transformation mechanism.
  • Figure 3: Ablation analysis on hyper-parameter $\rho$. When taking $\rho$ values from 0 to 1 in steps of 0.1, the prediction loss will slightly decrease first and then increase significantly if the $\rho$ exceeds a threshold, which means the selection of $\rho$ should be careful.
  • Figure 4: Training and validation loss curves of the ETTh1 and ETTm1 datasets.
  • Figure 5: SegAtt map and forecast samples for ETTh1-96
  • ...and 10 more figures