Table of Contents
Fetching ...

MultiResFormer: Transformer with Adaptive Multi-Resolution Modeling for General Time Series Forecasting

Linfeng Du, Ji Xin, Alex Labach, Saba Zuberi, Maksims Volkovs, Rahul G. Krishnan

TL;DR

The paper tackles the challenge of long-horizon time series forecasting where real-world data exhibit multiple coexisting periodicities. It introduces MultiResFormer, a Transformer with adaptive multi-resolution modeling that detects salient periodicities within each block and forms parallel patching branches at lengths aligned to these periods, enabling simultaneous interperiod and intraperiod modeling via a shared Transformer encoder. Key innovations include a periodicity-driven patching mechanism, an interpolation-based parameter-sharing strategy across resolutions, a resolution embedding scheme, and RevIN-based normalization to handle distribution shifts. Empirical results across long-term and short-term benchmarks show that MultiResFormer achieves state-of-the-art performance with substantially fewer parameters and competitive training efficiency, outperforming strong baselines like PatchTST and TimesNet on multiple datasets and horizons. The work contributes a practical, data-driven approach to adaptively fuse information across multiple time scales, with clear implications for applications in power, weather, transportation, and epidemiology where complex periodic patterns are prevalent.

Abstract

Transformer-based models have greatly pushed the boundaries of time series forecasting recently. Existing methods typically encode time series data into $\textit{patches}$ using one or a fixed set of patch lengths. This, however, could result in a lack of ability to capture the variety of intricate temporal dependencies present in real-world multi-periodic time series. In this paper, we propose MultiResFormer, which dynamically models temporal variations by adaptively choosing optimal patch lengths. Concretely, at the beginning of each layer, time series data is encoded into several parallel branches, each using a detected periodicity, before going through the transformer encoder block. We conduct extensive evaluations on long- and short-term forecasting datasets comparing MultiResFormer with state-of-the-art baselines. MultiResFormer outperforms patch-based Transformer baselines on long-term forecasting tasks and also consistently outperforms CNN baselines by a large margin, while using much fewer parameters than these baselines.

MultiResFormer: Transformer with Adaptive Multi-Resolution Modeling for General Time Series Forecasting

TL;DR

The paper tackles the challenge of long-horizon time series forecasting where real-world data exhibit multiple coexisting periodicities. It introduces MultiResFormer, a Transformer with adaptive multi-resolution modeling that detects salient periodicities within each block and forms parallel patching branches at lengths aligned to these periods, enabling simultaneous interperiod and intraperiod modeling via a shared Transformer encoder. Key innovations include a periodicity-driven patching mechanism, an interpolation-based parameter-sharing strategy across resolutions, a resolution embedding scheme, and RevIN-based normalization to handle distribution shifts. Empirical results across long-term and short-term benchmarks show that MultiResFormer achieves state-of-the-art performance with substantially fewer parameters and competitive training efficiency, outperforming strong baselines like PatchTST and TimesNet on multiple datasets and horizons. The work contributes a practical, data-driven approach to adaptively fuse information across multiple time scales, with clear implications for applications in power, weather, transportation, and epidemiology where complex periodic patterns are prevalent.

Abstract

Transformer-based models have greatly pushed the boundaries of time series forecasting recently. Existing methods typically encode time series data into using one or a fixed set of patch lengths. This, however, could result in a lack of ability to capture the variety of intricate temporal dependencies present in real-world multi-periodic time series. In this paper, we propose MultiResFormer, which dynamically models temporal variations by adaptively choosing optimal patch lengths. Concretely, at the beginning of each layer, time series data is encoded into several parallel branches, each using a detected periodicity, before going through the transformer encoder block. We conduct extensive evaluations on long- and short-term forecasting datasets comparing MultiResFormer with state-of-the-art baselines. MultiResFormer outperforms patch-based Transformer baselines on long-term forecasting tasks and also consistently outperforms CNN baselines by a large margin, while using much fewer parameters than these baselines.
Paper Structure (19 sections, 3 equations, 6 figures, 6 tables)

This paper contains 19 sections, 3 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Different ways to form multiple resolutions. Left: the hierarchy formed by iterative down-sampling with fixed rates. Each resolution is a coarsened version of the previous one. Right: Multiple resolutions adaptively formed by patching with salient periodicities. This breaks the dependency between resolutions and preserves all information in the original series at each resolution.
  • Figure 2: MultiResFormer architecture. Left: the MultiResFormer model consists of Instance Normalization/De-normalization at the beginning/end; in between there is a stack of $N$ MultiResFormer blocks (in the dotted box), followed by a Linear Prediction Head. We mark the shape of the intermediate tensors for better understanding. $PL_i$ denotes the patch length used in the $i$-th resolution branch, which corresponds to the $i$-th salient periodicity of $\mathbf{X}^{(l)}$. $NP_i$ denotes the number of patches after non-overlapping patching. $D$ denotes the model size that we interpolate each of the patches to. Right: at each resolution branch, $\mathbf{X}^{(l)}$ is padded, split into patches, interpolated into fixed-size embeddings, passed through a shared Transformer Block, interpolated back to its original patch length, flattened, and truncated.
  • Figure 3: Forecasting performance with varying look-back windows on 4 ETT datasets.
  • Figure 4: Representation analysis for long term forecasting on the Weather dataset.
  • Figure 5: Model efficiency comparison on the ETTh1 dataset.
  • ...and 1 more figures