PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting

Yongbo Yu; Weizhong Yu; Feiping Nie; Xuelong Li

PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting

Yongbo Yu, Weizhong Yu, Feiping Nie, Xuelong Li

TL;DR

This work targets the limitation of Transformer-based time series forecasting caused by positional encodings and quadratic attention, especially when leveraging long lookback windows. It introduces Pyramidal RNN Embedding (PRE), which combines Pyramid Temporal Convolution and Multi-Scale RNNs to learn multiscale, order-sensitive representations for univariate series, and integrates PRE with a standard Transformer encoder to form PRformer for multivariate forecasting. The approach yields substantial performance gains across eight datasets and demonstrates state-of-the-art results on long-horizon tasks, while maintaining scalable complexity with $O(\\frac{L}{W} + D^2)$. The results suggest that robust temporal representations and multiscale modeling can unlock the full potential of Transformer-based predictors for real-world, high-dimensional time series forecasting.

Abstract

The self-attention mechanism in Transformer architecture, invariant to sequence order, necessitates positional embeddings to encode temporal order in time series prediction. We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences, particularly when employing longer lookback windows. To address this, we introduce an innovative approach that combines Pyramid RNN embeddings(PRE) for univariate time series with the Transformer's capability to model multivariate dependencies. PRE, utilizing pyramidal one-dimensional convolutional layers, constructs multiscale convolutional features that preserve temporal order. Additionally, RNNs, layered atop these features, learn multiscale time series representations sensitive to sequence order. This integration into Transformer models with attention mechanisms results in significant performance enhancements. We present the PRformer, a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets. This performance highlights the effectiveness of our approach in leveraging longer lookback windows and underscores the critical role of robust temporal representations in maximizing Transformer's potential for prediction tasks. Code is available at this repository: \url{https://github.com/usualheart/PRformer}.

PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting

TL;DR

. The results suggest that robust temporal representations and multiscale modeling can unlock the full potential of Transformer-based predictors for real-world, high-dimensional time series forecasting.

Abstract

Paper Structure (23 sections, 8 equations, 5 figures, 7 tables)

This paper contains 23 sections, 8 equations, 5 figures, 7 tables.

Introduction
Related work
PRformer
Model Architecture
Pyramid RNN Embedding (PRE)
Pyramid Convolution Block
Multi-Scale RNN Block
Pyramid Convolutional Layer Configuration Method
Transformer Encoder Multivariate Attention
Loss Function and Normalization
Complexity Analysis of PRformer
EXPERIMENTS
Performance promotion with PRE
Long-term Time Series Forecasting
Model Analysis
...and 8 more sections

Figures (5)

Figure 1: Pyramidal RNN Embedding (PRE) Architecture.The actual pyramid structure is generated based on the settings.
Figure 2: The overall architecture of PRformer utilizes a Transformer encoder as its backbone, culminating in the generation of prediction results through a simple linear projection. (a) Pyramidal RNN Embedding (PRE) block. The initial sequences of diverse variables are independently fed into PRE block to acquire distinct representations in the form of embeddings. (b) multi-head self-attention is employed on the embeddings of multiple variables to capture intricate interdependencies among them.
Figure 3: The MAE results (Y-axis) of models with different lookback window sizes (X-axis) of long-term forecasting (T=96) on the Traffic and Electricity datasets. (a) Transformer results; (b) PRformer results.
Figure 4: In (a) we present the training times for one epoch on three datasets. In (b) we illustrate the corresponding memory usage. The experiments were conducted under equivalent hardware conditions and parameter configurations.
Figure 5: Correlation of origin time series with their respective PRE embeddings across multiple datasets. Figures a-e represent the original time series, while figures f-j depict the corresponding t-SNE clustering results of PRE embeddings.Similar representations correspond to original sequences with analogous shapes and patterns, whereas distant representations showcase more substantial deviations in shape, thereby acquiring meaningful time series representations.

PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting

TL;DR

Abstract

PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (5)