Table of Contents
Fetching ...

Pathformer: Multi-scale Transformers with Adaptive Pathways for Time Series Forecasting

Peng Chen, Yingying Zhang, Yunyao Cheng, Yang Shu, Yihang Wang, Qingsong Wen, Bin Yang, Chenjuan Guo

TL;DR

Pathformer tackles time series forecasting across multiple temporal scales by introducing a multi-scale Transformer that jointly models temporal resolution and temporal distance via patch division and dual attention. It adds adaptive pathways through a router-aggregator architecture with seasonality and trend decomposition to select and fuse scale-specific features, enabling data-driven, efficient multi-scale modeling. Extensive experiments on nine real-world datasets, plus large Wind Power and PEMS07 benchmarks, demonstrate state-of-the-art accuracy and strong transfer learning generalization, with lightweight part-tuning offering practical efficiency. This work advances adaptive multi-scale modeling in Transformers for time series and shows notable gains over contemporary baselines across domains.

Abstract

Transformers for time series forecasting mainly model time series from limited or fixed scales, making it challenging to capture different characteristics spanning various scales. We propose Pathformer, a multi-scale Transformer with adaptive pathways. It integrates both temporal resolution and temporal distance for multi-scale modeling. Multi-scale division divides the time series into different temporal resolutions using patches of various sizes. Based on the division of each scale, dual attention is performed over these patches to capture global correlations and local details as temporal dependencies. We further enrich the multi-scale Transformer with adaptive pathways, which adaptively adjust the multi-scale modeling process based on the varying temporal dynamics of the input, improving the accuracy and generalization of Pathformer. Extensive experiments on eleven real-world datasets demonstrate that Pathformer not only achieves state-of-the-art performance by surpassing all current models but also exhibits stronger generalization abilities under various transfer scenarios. The code is made available at https://github.com/decisionintelligence/pathformer.

Pathformer: Multi-scale Transformers with Adaptive Pathways for Time Series Forecasting

TL;DR

Pathformer tackles time series forecasting across multiple temporal scales by introducing a multi-scale Transformer that jointly models temporal resolution and temporal distance via patch division and dual attention. It adds adaptive pathways through a router-aggregator architecture with seasonality and trend decomposition to select and fuse scale-specific features, enabling data-driven, efficient multi-scale modeling. Extensive experiments on nine real-world datasets, plus large Wind Power and PEMS07 benchmarks, demonstrate state-of-the-art accuracy and strong transfer learning generalization, with lightweight part-tuning offering practical efficiency. This work advances adaptive multi-scale modeling in Transformers for time series and shows notable gains over contemporary baselines across domains.

Abstract

Transformers for time series forecasting mainly model time series from limited or fixed scales, making it challenging to capture different characteristics spanning various scales. We propose Pathformer, a multi-scale Transformer with adaptive pathways. It integrates both temporal resolution and temporal distance for multi-scale modeling. Multi-scale division divides the time series into different temporal resolutions using patches of various sizes. Based on the division of each scale, dual attention is performed over these patches to capture global correlations and local details as temporal dependencies. We further enrich the multi-scale Transformer with adaptive pathways, which adaptively adjust the multi-scale modeling process based on the varying temporal dynamics of the input, improving the accuracy and generalization of Pathformer. Extensive experiments on eleven real-world datasets demonstrate that Pathformer not only achieves state-of-the-art performance by surpassing all current models but also exhibits stronger generalization abilities under various transfer scenarios. The code is made available at https://github.com/decisionintelligence/pathformer.
Paper Structure (23 sections, 7 equations, 6 figures, 10 tables)

This paper contains 23 sections, 7 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Left: Time series are divided into patches of varying sizes as temporal resolution. The intervals in blue, orange, and red represent different patch sizes. Right: Local details (black arrows) and global correlations (color arrows) are modeled through different temporal distances.
  • Figure 2: The architecture of Pathformer. The Multi-scale Transformer Block (MST Block) comprises patch division with multiple patch sizes and dual attention. The adaptive pathways select the patch sizes with the top $K$ weights generated by the router to capture multi-scale characteristics, and the selected patch sizes are represented in blue. Then, the aggregator applies weighted aggregation to the characteristics obtained from the MST Block.
  • Figure 3: (a) The structure of the Multi-Scale Transformer Block, which mainly consists of Patch Division, Inter-patch attention, and Intra-patch attention. (b) The structure of the Multi-Scale Router.
  • Figure 4: The average pathways weights of different patch sizes for the Weather. $B_1$, $B_2$, and $B_3$ denote distinct AMS (Adaptive Multi-Scale) blocks, while $S_1$, $S_2$, $S_3$, and $S_4$ represent varying patch sizes within each AMS block, with patch size decreasing sequentially.
  • Figure 5: Results with different input length for ETTh1, ETTh2, Weather and Electricity.
  • ...and 1 more figures