PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting
Yongbo Yu, Weizhong Yu, Feiping Nie, Xuelong Li
TL;DR
This work targets the limitation of Transformer-based time series forecasting caused by positional encodings and quadratic attention, especially when leveraging long lookback windows. It introduces Pyramidal RNN Embedding (PRE), which combines Pyramid Temporal Convolution and Multi-Scale RNNs to learn multiscale, order-sensitive representations for univariate series, and integrates PRE with a standard Transformer encoder to form PRformer for multivariate forecasting. The approach yields substantial performance gains across eight datasets and demonstrates state-of-the-art results on long-horizon tasks, while maintaining scalable complexity with $O(\\frac{L}{W} + D^2)$. The results suggest that robust temporal representations and multiscale modeling can unlock the full potential of Transformer-based predictors for real-world, high-dimensional time series forecasting.
Abstract
The self-attention mechanism in Transformer architecture, invariant to sequence order, necessitates positional embeddings to encode temporal order in time series prediction. We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences, particularly when employing longer lookback windows. To address this, we introduce an innovative approach that combines Pyramid RNN embeddings(PRE) for univariate time series with the Transformer's capability to model multivariate dependencies. PRE, utilizing pyramidal one-dimensional convolutional layers, constructs multiscale convolutional features that preserve temporal order. Additionally, RNNs, layered atop these features, learn multiscale time series representations sensitive to sequence order. This integration into Transformer models with attention mechanisms results in significant performance enhancements. We present the PRformer, a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets. This performance highlights the effectiveness of our approach in leveraging longer lookback windows and underscores the critical role of robust temporal representations in maximizing Transformer's potential for prediction tasks. Code is available at this repository: \url{https://github.com/usualheart/PRformer}.
