Table of Contents
Fetching ...

Fredformer: Frequency Debiased Transformer for Time Series Forecasting

Xihao Piao, Zheng Chen, Taichi Murayama, Yasuko Matsubara, Yasushi Sakurai

TL;DR

Fredformer tackles frequency bias in Transformer-based time series forecasting, where low-frequency components tend to dominate learning due to energy concentration. It introduces a frequency-debiased Transformer built on a DFT-to-IDFT backbone, frequency refinement/normalization, and local independent modeling to equalize learning across frequency bands, plus a Nyström-based lightweight variant for efficiency. Empirical results on eight real-world datasets show state-of-the-art forecasting accuracy and clear reduction in frequency bias compared with baselines, with ablations confirming the importance of each component. The method enables more reliable short- and mid-term forecasts in diverse domains and offers a practical path toward scalable frequency-aware Transformers.

Abstract

The Transformer model has shown leading performance in time series forecasting. Nevertheless, in some complex scenarios, it tends to learn low-frequency features in the data and overlook high-frequency features, showing a frequency bias. This bias prevents the model from accurately capturing important high-frequency data features. In this paper, we undertook empirical analyses to understand this bias and discovered that frequency bias results from the model disproportionately focusing on frequency features with higher energy. Based on our analysis, we formulate this bias and propose Fredformer, a Transformer-based framework designed to mitigate frequency bias by learning features equally across different frequency bands. This approach prevents the model from overlooking lower amplitude features important for accurate forecasting. Extensive experiments show the effectiveness of our proposed approach, which can outperform other baselines in different real-world time-series datasets. Furthermore, we introduce a lightweight variant of the Fredformer with an attention matrix approximation, which achieves comparable performance but with much fewer parameters and lower computation costs. The code is available at: https://github.com/chenzRG/Fredformer

Fredformer: Frequency Debiased Transformer for Time Series Forecasting

TL;DR

Fredformer tackles frequency bias in Transformer-based time series forecasting, where low-frequency components tend to dominate learning due to energy concentration. It introduces a frequency-debiased Transformer built on a DFT-to-IDFT backbone, frequency refinement/normalization, and local independent modeling to equalize learning across frequency bands, plus a Nyström-based lightweight variant for efficiency. Empirical results on eight real-world datasets show state-of-the-art forecasting accuracy and clear reduction in frequency bias compared with baselines, with ablations confirming the importance of each component. The method enables more reliable short- and mid-term forecasts in diverse domains and offers a practical path toward scalable frequency-aware Transformers.

Abstract

The Transformer model has shown leading performance in time series forecasting. Nevertheless, in some complex scenarios, it tends to learn low-frequency features in the data and overlook high-frequency features, showing a frequency bias. This bias prevents the model from accurately capturing important high-frequency data features. In this paper, we undertook empirical analyses to understand this bias and discovered that frequency bias results from the model disproportionately focusing on frequency features with higher energy. Based on our analysis, we formulate this bias and propose Fredformer, a Transformer-based framework designed to mitigate frequency bias by learning features equally across different frequency bands. This approach prevents the model from overlooking lower amplitude features important for accurate forecasting. Extensive experiments show the effectiveness of our proposed approach, which can outperform other baselines in different real-world time-series datasets. Furthermore, we introduce a lightweight variant of the Fredformer with an attention matrix approximation, which achieves comparable performance but with much fewer parameters and lower computation costs. The code is available at: https://github.com/chenzRG/Fredformer
Paper Structure (33 sections, 2 theorems, 33 equations, 15 figures, 9 tables, 2 algorithms)

This paper contains 33 sections, 2 theorems, 33 equations, 15 figures, 9 tables, 2 algorithms.

Key Result

Lemma 1

Frequency-wise Local Normalization: Given frequency patches $\forall\, \mathbf{W}_n,\, \mathbf{W}_m \in \mathbf{W}$ for $\max(\mathbf{W}_n) > \max(\mathbf{W}_m)$ and $\sigma(\cdot)$, the normalization strategy is defined by: This ensures that within each localized frequency patch $\mathbf{W}_n$, the amplitude differences between key frequency components are minimized, promoting equal attention to

Figures (15)

  • Figure 1: In contrast to a frequency modeling-based work FEDformer ICMLFedformer and a SOTA work PatchTST PatchTST, our model can accurately capture more significant mid-to-high frequency components.
  • Figure 2: Figure (a) shows the learning dynamics and results for two synthetic datasets, employing line graphs to illustrate amplitudes in the frequency domain and heatmaps to represent training epoch errors. Figure (b) explores the influence of amplitude and domain on learning by comparing Transformers in the time and frequency domains, both with and without frequency local normalization.
  • Figure 3: Overview of our framework. Fredformer employs DFT to transform input sequences into the frequency domain, normalizes locally, and segments into patches before employing channel-wise attention, yielding final predictions through a frequency-wise summarizing layer and IDFT.
  • Figure 4: Visualizations of the learning dynamics and results for Fredformer and baselines on the ETTh1 dataset, employing line graphs to illustrate amplitudes in the frequency domain and heatmaps to represent training epoch errors.
  • Figure 5: This figure compares prediction accuracy and computational complexity (VRAM usage) among Transformer-based methods, Fredformer (Ours), and its optimized variant, Nyström-Fredformer(Ours*).
  • ...and 10 more figures

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Lemma 1
  • Lemma 2