S2TX: Cross-Attention Multi-Scale State-Space Transformer for Time Series Forecasting
Zihao Wu, Juncheng Dong, Haoming Yang, Vahid Tarokh
TL;DR
This work tackles multivariate time-series forecasting by addressing cross-variate correlations and global-local interactions that are often modeled separately. It introduces S2TX, a cross-attention-based State-Space Transformer that uses a global Mamba module to extract long-range cross-variate context from coarse patches and a local patch-based Transformer to model short-range, variate-local patterns. A cross-attention mechanism fuses these contexts, enabling variate-level interactions and efficient global-local communication. Empirical results on seven benchmark datasets across multiple horizons demonstrate state-of-the-art performance with a low memory footprint and robustness to missing data.
Abstract
Time series forecasting has recently achieved significant progress with multi-scale models to address the heterogeneity between long and short range patterns. Despite their state-of-the-art performance, we identify two potential areas for improvement. First, the variates of the multivariate time series are processed independently. Moreover, the multi-scale (long and short range) representations are learned separately by two independent models without communication. In light of these concerns, we propose State Space Transformer with cross-attention (S2TX). S2TX employs a cross-attention mechanism to integrate a Mamba model for extracting long-range cross-variate context and a Transformer model with local window attention to capture short-range representations. By cross-attending to the global context, the Transformer model further facilitates variate-level interactions as well as local/global communications. Comprehensive experiments on seven classic long-short range time-series forecasting benchmark datasets demonstrate that S2TX can achieve highly robust SOTA results while maintaining a low memory footprint.
