Table of Contents
Fetching ...

PhaseFormer: From Patches to Phases for Efficient and Effective Time Series Forecasting

Yiming Niu, Jinliang Deng, Yongxin Tong

TL;DR

PhaseFormer reframes long-horizon time-series forecasting by substituting patch-based tokens with phase tokens, producing phase-wise predictions through a lightweight cross-phase routing Transformer. The method combines data-driven phase extraction with a two-stage routing mechanism and a shared predictor, achieving state-of-the-art accuracy with roughly $\,\approx 10^3$ parameters while dramatically reducing FLOPs, especially on large-scale datasets like Traffic and Electricity. The authors theoretically justify phase-token stability under cycle-pattern drifts and provide extensive empirical evidence across seven benchmarks, including ablations and case studies, to demonstrate both robustness and efficiency. This work offers a practical path toward truly efficient and effective forecasting, and the accompanying code is available for reproducibility.

Abstract

Periodicity is a fundamental characteristic of time series data and has long played a central role in forecasting. Recent deep learning methods strengthen the exploitation of periodicity by treating patches as basic tokens, thereby improving predictive effectiveness. However, their efficiency remains a bottleneck due to large parameter counts and heavy computational costs. This paper provides, for the first time, a clear explanation of why patch-level processing is inherently inefficient, supported by strong evidence from real-world data. To address these limitations, we introduce a phase perspective for modeling periodicity and present an efficient yet effective solution, PhaseFormer. PhaseFormer features phase-wise prediction through compact phase embeddings and efficient cross-phase interaction enabled by a lightweight routing mechanism. Extensive experiments demonstrate that PhaseFormer achieves state-of-the-art performance with around 1k parameters, consistently across benchmark datasets. Notably, it excels on large-scale and complex datasets, where models with comparable efficiency often struggle. This work marks a significant step toward truly efficient and effective time series forecasting. Code is available at this repository: https://github.com/neumyor/PhaseFormer_TSL

PhaseFormer: From Patches to Phases for Efficient and Effective Time Series Forecasting

TL;DR

PhaseFormer reframes long-horizon time-series forecasting by substituting patch-based tokens with phase tokens, producing phase-wise predictions through a lightweight cross-phase routing Transformer. The method combines data-driven phase extraction with a two-stage routing mechanism and a shared predictor, achieving state-of-the-art accuracy with roughly parameters while dramatically reducing FLOPs, especially on large-scale datasets like Traffic and Electricity. The authors theoretically justify phase-token stability under cycle-pattern drifts and provide extensive empirical evidence across seven benchmarks, including ablations and case studies, to demonstrate both robustness and efficiency. This work offers a practical path toward truly efficient and effective forecasting, and the accompanying code is available for reproducibility.

Abstract

Periodicity is a fundamental characteristic of time series data and has long played a central role in forecasting. Recent deep learning methods strengthen the exploitation of periodicity by treating patches as basic tokens, thereby improving predictive effectiveness. However, their efficiency remains a bottleneck due to large parameter counts and heavy computational costs. This paper provides, for the first time, a clear explanation of why patch-level processing is inherently inefficient, supported by strong evidence from real-world data. To address these limitations, we introduce a phase perspective for modeling periodicity and present an efficient yet effective solution, PhaseFormer. PhaseFormer features phase-wise prediction through compact phase embeddings and efficient cross-phase interaction enabled by a lightweight routing mechanism. Extensive experiments demonstrate that PhaseFormer achieves state-of-the-art performance with around 1k parameters, consistently across benchmark datasets. Notably, it excels on large-scale and complex datasets, where models with comparable efficiency often struggle. This work marks a significant step toward truly efficient and effective time series forecasting. Code is available at this repository: https://github.com/neumyor/PhaseFormer_TSL

Paper Structure

This paper contains 31 sections, 6 theorems, 31 equations, 10 figures, 7 tables.

Key Result

Theorem 1

Let $X = A G^\top + N \in \mathbb{R}^{D\times H}$ with $\operatorname{rank}(A)=\operatorname{rank}(G)=r \ll \min(D,H)$, and consider the transformed data where $\|N'\|_2 \le \|S\|_2\|N\|_2$, $\|R\|_2 \le \varepsilon(\|M\|_F+\|N\|_F)$, and let $\delta_{\min} > 0$ denote the minimal spectral separation. Then there exists a universal constant $C>0$ such that:

Figures (10)

  • Figure 1: Comparison between patch-based and phase-based representations for time-series forecasting. (a) illustrates the difference in tokenization. (b) jointly evaluates model accuracy, parameter scale, and computational overhead on the Traffic dataset, where marker size indicates FLOPS.
  • Figure 2: Visualization of phase tokenization and its advantages. (a) Phase tokenization yields more stable representations than patch-based embeddings. (b) Phase tokens exhibit clear low-dimensionality compared with patch tokens.
  • Figure 3: The overview of PhaseFormer.
  • Figure 4: Comparison of FLOPs and parameter counts across models on the Traffic and Electricity. Patch-based models are shown in green, phase-based models in blue, and other models in gray.
  • Figure 5: Effect of varying the number of routers $M$ on forecasting performance on three datasets.
  • ...and 5 more figures

Theorems & Definitions (8)

  • Theorem 1: Phase Tokenization Stability
  • Lemma 1: Column space preservation
  • Lemma 2: Row space change
  • Lemma 3: Wedin’s $\sin\Theta$ theorem
  • Theorem 2
  • Proof 1
  • Theorem 3: Stability under day-wise perturbations
  • Proof 2