Multi-scale Transformer Pyramid Networks for Multivariate Time Series Forecasting
Yifan Zhang, Rui Wu, Sergiu M. Dascalu, Frederick C. Harris
TL;DR
This work tackles multivariate time series forecasting by addressing the limitation of existing transformer models that rely on single or exponentially growing temporal scales. It introduces a dimension invariant embedding (DI) that preserves both time steps and variables while projecting data into a higher-dimensional space, and a Multi-scale Transformer Pyramid Network (MTPNet) that models dependencies across unconstrained scales via a pyramid of encoder–decoder levels and inter-scale connections. The approach also decomposes the data into seasonal and trend components, enabling the seasonal MTPNet to forecast while a linear model handles trend, with predictions combined through a final convolution. On nine real-world benchmarks, MTPNet outperforms state-of-the-art baselines, with strong gains in MSE and MAE, demonstrating the practical value of flexible multi-scale temporal modeling for MTS forecasting.
Abstract
Multivariate Time Series (MTS) forecasting involves modeling temporal dependencies within historical records. Transformers have demonstrated remarkable performance in MTS forecasting due to their capability to capture long-term dependencies. However, prior work has been confined to modeling temporal dependencies at either a fixed scale or multiple scales that exponentially increase (most with base 2). This limitation hinders their effectiveness in capturing diverse seasonalities, such as hourly and daily patterns. In this paper, we introduce a dimension invariant embedding technique that captures short-term temporal dependencies and projects MTS data into a higher-dimensional space, while preserving the dimensions of time steps and variables in MTS data. Furthermore, we present a novel Multi-scale Transformer Pyramid Network (MTPNet), specifically designed to effectively capture temporal dependencies at multiple unconstrained scales. The predictions are inferred from multi-scale latent representations obtained from transformers at various scales. Extensive experiments on nine benchmark datasets demonstrate that the proposed MTPNet outperforms recent state-of-the-art methods.
