Table of Contents
Fetching ...

Frequency Matters: When Time Series Foundation Models Fail Under Spectral Shift

Tianze Wang, Sofiane Ennadir, John Pertoft, Gabriela Zarzar Gandler, Lele Cao, Zineb Senane, Styliani Katsarou, Sahar Asadi, Axel Karlsson, Oleg Smirnov

TL;DR

The paper investigates why Time Series Foundation Models (TSFMs) fail to generalize in industrial settings, despite strong public benchmarks, and identifies spectral shift as a central cause. It combines an industrial Player Engagement Prediction task with controlled synthetic experiments to show that TSFMs degrade when downstream dominant frequencies differ from those seen during pretraining. The work demonstrates that models like MOMENT underperform compared to domain-adapted baselines when frequency bands are mismatched, and it provides a synthetic protocol to test spectral robustness. The authors argue for frequency-aware pretraining and evaluation, proposing practical steps to quantify spectral overlap, augment training with frequency-aware techniques, and adopt benchmarks that stress spectral diversity, with the goal of improving the real-world robustness of TSFMs.

Abstract

Time series foundation models (TSFMs) have shown strong results on public benchmarks, prompting comparisons to a "BERT moment" for time series. Their effectiveness in industrial settings, however, remains uncertain. We examine why TSFMs often struggle to generalize and highlight spectral shift (a mismatch between the dominant frequency components in downstream tasks and those represented during pretraining) as a key factor. We present evidence from an industrial-scale player engagement prediction task in mobile gaming, where TSFMs underperform domain-adapted baselines. To isolate the mechanism, we design controlled synthetic experiments contrasting signals with seen versus unseen frequency bands, observing systematic degradation under spectral mismatch. These findings position frequency awareness as critical for robust TSFM deployment and motivate new pretraining and evaluation protocols that explicitly account for spectral diversity.

Frequency Matters: When Time Series Foundation Models Fail Under Spectral Shift

TL;DR

The paper investigates why Time Series Foundation Models (TSFMs) fail to generalize in industrial settings, despite strong public benchmarks, and identifies spectral shift as a central cause. It combines an industrial Player Engagement Prediction task with controlled synthetic experiments to show that TSFMs degrade when downstream dominant frequencies differ from those seen during pretraining. The work demonstrates that models like MOMENT underperform compared to domain-adapted baselines when frequency bands are mismatched, and it provides a synthetic protocol to test spectral robustness. The authors argue for frequency-aware pretraining and evaluation, proposing practical steps to quantify spectral overlap, augment training with frequency-aware techniques, and adopt benchmarks that stress spectral diversity, with the goal of improving the real-world robustness of TSFMs.

Abstract

Time series foundation models (TSFMs) have shown strong results on public benchmarks, prompting comparisons to a "BERT moment" for time series. Their effectiveness in industrial settings, however, remains uncertain. We examine why TSFMs often struggle to generalize and highlight spectral shift (a mismatch between the dominant frequency components in downstream tasks and those represented during pretraining) as a key factor. We present evidence from an industrial-scale player engagement prediction task in mobile gaming, where TSFMs underperform domain-adapted baselines. To isolate the mechanism, we design controlled synthetic experiments contrasting signals with seen versus unseen frequency bands, observing systematic degradation under spectral mismatch. These findings position frequency awareness as critical for robust TSFM deployment and motivate new pretraining and evaluation protocols that explicitly account for spectral diversity.

Paper Structure

This paper contains 16 sections, 4 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: An example MTS sample.
  • Figure 2: Analysis of the dominant frequencies in our dataset and in the datasets used for pretraining MOMENT (FordA and FaultDetectionA).
  • Figure 3: Synthetic series from seen vs. unseen frequency bands. Top: time domain; Bottom: spectra.