Performance of Zero-Shot Time Series Foundation Models on Cloud Data
William Toner, Thomas L. Lee, Artjom Joosen, Rajkarn Singh, Martin Asenov
TL;DR
The paper evaluates zero-shot time-series foundation models on cloud data from Huawei Cloud, questioning their claimed cross-domain generalization. It conducts an empirical study across multiple FMs (e.g., VisionTS, Moirai, TimesFM, Chronos, TTM, Mamba4Cast) and compares them to online ridge regression and naive seasonal baselines using rolling-window forecasting and metrics like $ ext{MASE}$ and $ ext{RMSSE}$. The results show all FMs underperform baselines on cloud data, with notable pathologies such as chaotic forecasts and failure to capture seasonality or spikes; VisionTS often behaves like a naive seasonal forecaster, contributing to the overall degradation. The findings challenge broad generalization claims for zero-shot FMs in cloud contexts and motivate development of conditioning or fine-tuning approaches to adapt FMs to time-series data characterized by spikes and strong periodicity, to make them practically useful for cloud forecasting.
Abstract
Time series foundation models (FMs) have emerged as a popular paradigm for zero-shot multi-domain forecasting. FMs are trained on numerous diverse datasets and claim to be effective forecasters across multiple different time series domains, including cloud data. In this work we investigate this claim, exploring the effectiveness of FMs on cloud data. We demonstrate that many well-known FMs fail to generate meaningful or accurate zero-shot forecasts in this setting. We support this claim empirically, showing that FMs are outperformed consistently by simple linear baselines. We also illustrate a number of interesting pathologies, including instances where FMs suddenly output seemingly erratic, random-looking forecasts. Our results suggest a widespread failure of FMs to model cloud data.
