Reverso: Efficient Time Series Foundation Models for Zero-shot Forecasting
Xinghong Fu, Yanhong Li, Georgios Papaioannou, Yoon Kim
TL;DR
The paper tackles the high computational cost of time series foundation models by proposing Reverso, a compact decoder-based TSFM that achieves strong zero-shot forecasting with as few as $0.2M$ to $2.6M$ parameters. It combines a hybrid sequence-mixing architecture (long convolutions and DeltaNet) with an attention-based decoder, a lightweight embedding, targeted data augmentation, Gaussian-process synthetic data, and FFT-based downsampling to expand effective context. Empirical results show Reverso attains competitive or superior performance on GiftEval and LTSF benchmarks relative to much larger baselines, thereby pushing the efficiency-performance Pareto frontier. The work provides a practical recipe for compact TSFMs, with ablations highlighting the value of the hybrid architecture and inference strategies, and discusses limitations and future work in multivariate forecasting and uncertainty quantification.
Abstract
Learning time series foundation models has been shown to be a promising approach for zero-shot time series forecasting across diverse time series domains. Insofar as scaling has been a critical driver of performance of foundation models in other modalities such as language and vision, much recent work on time series foundation modeling has focused on scaling. This has resulted in time series foundation models with hundreds of millions of parameters that are, while performant, inefficient and expensive to use in practice. This paper describes a simple recipe for learning efficient foundation models for zero-shot time series forecasting that are orders of magnitude smaller. We show that large-scale transformers are not necessary: small hybrid models that interleave long convolution and linear RNN layers (in particular DeltaNet layers) can match the performance of larger transformer-based models while being more than a hundred times smaller. We also describe several data augmentation and inference strategies that further improve performance. This recipe results in Reverso, a family of efficient time series foundation models for zero-shot forecasting that significantly push the performance-efficiency Pareto frontier.
