Table of Contents
Fetching ...

TIFO: Time-Invariant Frequency Operator for Stationarity-Aware Representation Learning in Time Series

Xihao Piao, Zheng Chen, Lingwei Zhu, Yushun Dong, Yasuko Matsubara, Yasushi Sakurai

TL;DR

A Time-Invariant Frequency Operator (TIFO), which learns stationarity-aware weights over the frequency spectrum across the entire dataset, thereby mitigating the distribution shift issue in time series.

Abstract

Nonstationary time series forecasting suffers from the distribution shift issue due to the different distributions that produce the training and test data. Existing methods attempt to alleviate the dependence by, e.g., removing low-order moments from each individual sample. These solutions fail to capture the underlying time-evolving structure across samples and do not model the complex time structure. In this paper, we aim to address the distribution shift in the frequency space by considering all possible time structures. To this end, we propose a Time-Invariant Frequency Operator (TIFO), which learns stationarity-aware weights over the frequency spectrum across the entire dataset. The weight representation highlights stationary frequency components while suppressing non-stationary ones, thereby mitigating the distribution shift issue in time series. To justify our method, we show that the Fourier transform of time series data implicitly induces eigen-decomposition in the frequency space. TIFO is a plug-and-play approach that can be seamlessly integrated into various forecasting models. Experiments demonstrate our method achieves 18 top-1 and 6 top-2 results out of 28 forecasting settings. Notably, it yields 33.3% and 55.3% improvements in average MSE on the ETTm2 dataset. In addition, TIFO reduces computational costs by 60% -70% compared to baseline methods, demonstrating strong scalability across diverse forecasting models.

TIFO: Time-Invariant Frequency Operator for Stationarity-Aware Representation Learning in Time Series

TL;DR

A Time-Invariant Frequency Operator (TIFO), which learns stationarity-aware weights over the frequency spectrum across the entire dataset, thereby mitigating the distribution shift issue in time series.

Abstract

Nonstationary time series forecasting suffers from the distribution shift issue due to the different distributions that produce the training and test data. Existing methods attempt to alleviate the dependence by, e.g., removing low-order moments from each individual sample. These solutions fail to capture the underlying time-evolving structure across samples and do not model the complex time structure. In this paper, we aim to address the distribution shift in the frequency space by considering all possible time structures. To this end, we propose a Time-Invariant Frequency Operator (TIFO), which learns stationarity-aware weights over the frequency spectrum across the entire dataset. The weight representation highlights stationary frequency components while suppressing non-stationary ones, thereby mitigating the distribution shift issue in time series. To justify our method, we show that the Fourier transform of time series data implicitly induces eigen-decomposition in the frequency space. TIFO is a plug-and-play approach that can be seamlessly integrated into various forecasting models. Experiments demonstrate our method achieves 18 top-1 and 6 top-2 results out of 28 forecasting settings. Notably, it yields 33.3% and 55.3% improvements in average MSE on the ETTm2 dataset. In addition, TIFO reduces computational costs by 60% -70% compared to baseline methods, demonstrating strong scalability across diverse forecasting models.
Paper Structure (31 sections, 2 theorems, 2 equations, 12 figures, 17 tables, 8 algorithms)

This paper contains 31 sections, 2 theorems, 2 equations, 12 figures, 17 tables, 8 algorithms.

Key Result

Theorem 1

A kernel function $k(x, y) \geq 0$ is a distance measure of input $x, y$. It is valid if and only if there exists a probability density that is the Fourier transform of the kernel.

Figures (12)

  • Figure 1: Overview of TIFO. Before training, we first transfer all samples into the frequency domain and measure their cross-sample stationarity at the dataset level (steps 1 & 2). These features are then used to learn frequency weights that measure frequency stationarity (step 3). During training, each input sample is transformed into the frequency domain and then weighted by the learned stationarity weights. Finally, they are transformed to the time domain to serve as input to the forecasting models (steps 4 & 5). TIFO is optimized using the forecasting loss along with the backbone model.
  • Figure 2: Train-Test Distance Compactness: This figure shows a visualization of the JSD$^{2}$ amplitudes distribution distance between the train and test datasets on the electricity data. Each scatter point represents one frequency component. A smaller radius indicates a smaller distributional gap. Green and red colors represent the results before and after applying the learning method, respectively.
  • Figure 3: Frequency‑domain distribution distance between the train and test set. Both the JSD$^{2}$$(\downarrow)$ and the KS statistic $(\downarrow)$ are computed on amplitudes; bold marks the best per row.
  • Figure 4: Frequency‐domain analysis of Fourier basis Learning. From left to right: unprocessed spectra (Before), after FAN, and after applying TIFO. Each panel is a 3D box with time, frequency, and amplitude axes. We visualize four Fourier basis waves, $\zeta_{\omega_{1:4}}$, to illustrate how each processing method alters the basis functions. In the frequency–amplitude plane, we plot three forecasting cases: the ground truth in blue and the forecasting results based on the processed input in red. The red diamonds mark key local peak frequencies.
  • Figure 5: Illustration of z-score normalization in both time and frequency domains. (Top) Data generating distributions $p(x|t_i)$ across different temporal structures $t_i$ are aligned after $z$-score, sharing a common location and scale. (Bottom) Frequency-domain power spectra before and after $z$-score. The frequency-domain distribution remains divergent after normalization.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Theorem 1: Bochner's Theorem Scholkopf-kernel
  • Theorem 2: Mercer's Theorem mercer1909mercer