Table of Contents
Fetching ...

Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization

Luca Masserano, Abdul Fatir Ansari, Boran Han, Xiyuan Zhang, Christos Faloutsos, Michael W. Mahoney, Andrew Gordon Wilson, Youngsuk Park, Syama Rangapuram, Danielle C. Maddix, Yuyang Wang

TL;DR

This study tackles the challenge of designing an effective discrete vocabulary for real-valued time series in foundation models. It introduces WaveToken, a wavelet-based tokenizer that decomposes time series into time-localized frequency bands via a maximally decimated discrete wavelet transform, thresholds and quantizes the coefficients, and trains an autoregressive model to forecast the wavelet codes. The vocabulary is compact (1024 tokens) yet expressive, capable of encoding complex patterns such as trends, spikes, and non-stationarity, and supports coarse-to-fine forecasting across scales. Across 42 real-world datasets and zero-shot settings, WaveToken achieves superior accuracy and the best average rank among baselines, demonstrating strong generalization and practical potential for cross-domain forecasting with foundation models in time series.

Abstract

How to best develop foundational models for time series forecasting remains an important open question. Tokenization is a crucial consideration in this effort: what is an effective discrete vocabulary for a real-valued sequential input? To address this question, we develop WaveToken, a wavelet-based tokenizer that allows models to learn complex representations directly in the space of time-localized frequencies. Our method first scales and decomposes the input time series, then thresholds and quantizes the wavelet coefficients, and finally pre-trains an autoregressive model to forecast coefficients for the forecast horizon. By decomposing coarse and fine structures in the inputs, wavelets provide an eloquent and compact language for time series forecasting that simplifies learning. Empirical results on a comprehensive benchmark, including 42 datasets for both in-domain and zero-shot settings, show that WaveToken: i) provides better accuracy than recently proposed foundation models for forecasting while using a much smaller vocabulary (1024 tokens), and performs on par or better than modern deep learning models trained specifically on each dataset; and ii) exhibits superior generalization capabilities, achieving the best average rank across all datasets for three complementary metrics. In addition, we show that our method can easily capture complex temporal patterns of practical relevance that are challenging for other recent pre-trained models, including trends, sparse spikes, and non-stationary time series with varying frequencies evolving over time.

Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization

TL;DR

This study tackles the challenge of designing an effective discrete vocabulary for real-valued time series in foundation models. It introduces WaveToken, a wavelet-based tokenizer that decomposes time series into time-localized frequency bands via a maximally decimated discrete wavelet transform, thresholds and quantizes the coefficients, and trains an autoregressive model to forecast the wavelet codes. The vocabulary is compact (1024 tokens) yet expressive, capable of encoding complex patterns such as trends, spikes, and non-stationarity, and supports coarse-to-fine forecasting across scales. Across 42 real-world datasets and zero-shot settings, WaveToken achieves superior accuracy and the best average rank among baselines, demonstrating strong generalization and practical potential for cross-domain forecasting with foundation models in time series.

Abstract

How to best develop foundational models for time series forecasting remains an important open question. Tokenization is a crucial consideration in this effort: what is an effective discrete vocabulary for a real-valued sequential input? To address this question, we develop WaveToken, a wavelet-based tokenizer that allows models to learn complex representations directly in the space of time-localized frequencies. Our method first scales and decomposes the input time series, then thresholds and quantizes the wavelet coefficients, and finally pre-trains an autoregressive model to forecast coefficients for the forecast horizon. By decomposing coarse and fine structures in the inputs, wavelets provide an eloquent and compact language for time series forecasting that simplifies learning. Empirical results on a comprehensive benchmark, including 42 datasets for both in-domain and zero-shot settings, show that WaveToken: i) provides better accuracy than recently proposed foundation models for forecasting while using a much smaller vocabulary (1024 tokens), and performs on par or better than modern deep learning models trained specifically on each dataset; and ii) exhibits superior generalization capabilities, achieving the best average rank across all datasets for three complementary metrics. In addition, we show that our method can easily capture complex temporal patterns of practical relevance that are challenging for other recent pre-trained models, including trends, sparse spikes, and non-stationary time series with varying frequencies evolving over time.

Paper Structure

This paper contains 28 sections, 7 equations, 15 figures, 9 tables, 1 algorithm.

Figures (15)

  • Figure 1: WaveToken-Base (199M parameters) provides excellent forecasts with very low uncertainty. Performance of different foundation models for time series forecasting on complex patterns of practical relevance: Chronos-Base (201M), TimesFM (200M) and Moirai-Large (311M) struggle to capture exponential trends (top row), sparse spikes (second row), and non-stationary signals with 2 and 5 frequencies evolving over time (bottom two rows).
  • Figure 2: High-level depiction of our method.(Left)WaveToken first re-scales the input time series by computing $\tilde{x}_t = (x_t - \mu_{1:C})/\sigma_{1:C}$, then it applies the DWT and possibly thresholds the resulting detail coefficients to zero (red crosses). The wavelet coefficients are finally quantized to bins of optimal size given their empirical distribution, and then concatenated together (excluding the first $J-1$ approximations). (Right) After pretraining on a large corpus of time series, at inference time the model samples autoregressively from the categorical output distribution and yields coefficients at all decomposition levels, which are pushed through the inverse tokenizer to obtain a forecast.
  • Figure 3: WaveToken performs on par or better than other baselines on in-domain datasets. Forecasting accuracy on Benchmark I in terms of WQL, MASE and VRSE.
  • Figure 4: WaveToken performs on par or better even relative to task-specific models on zero-shot datasets. Forecasting accuracy on Benchmark II in terms of WQL, MASE and VRSE.
  • Figure 5: Wavelet-based tokenization induces structured patterns in the cross-attention maps. Cross-attention weights for the eighth decoder layer when forecasting the spiky data of Figure \ref{['fig:qualitative']} (second row). Chronos-Base (left) repeats the same patterns for all steps, while WaveToken-Base (right) shows high values at the detail coefficients corresponding to the spikes.
  • ...and 10 more figures