Table of Contents
Fetching ...

Performance of Zero-Shot Time Series Foundation Models on Cloud Data

William Toner, Thomas L. Lee, Artjom Joosen, Rajkarn Singh, Martin Asenov

TL;DR

The paper evaluates zero-shot time-series foundation models on cloud data from Huawei Cloud, questioning their claimed cross-domain generalization. It conducts an empirical study across multiple FMs (e.g., VisionTS, Moirai, TimesFM, Chronos, TTM, Mamba4Cast) and compares them to online ridge regression and naive seasonal baselines using rolling-window forecasting and metrics like $ ext{MASE}$ and $ ext{RMSSE}$. The results show all FMs underperform baselines on cloud data, with notable pathologies such as chaotic forecasts and failure to capture seasonality or spikes; VisionTS often behaves like a naive seasonal forecaster, contributing to the overall degradation. The findings challenge broad generalization claims for zero-shot FMs in cloud contexts and motivate development of conditioning or fine-tuning approaches to adapt FMs to time-series data characterized by spikes and strong periodicity, to make them practically useful for cloud forecasting.

Abstract

Time series foundation models (FMs) have emerged as a popular paradigm for zero-shot multi-domain forecasting. FMs are trained on numerous diverse datasets and claim to be effective forecasters across multiple different time series domains, including cloud data. In this work we investigate this claim, exploring the effectiveness of FMs on cloud data. We demonstrate that many well-known FMs fail to generate meaningful or accurate zero-shot forecasts in this setting. We support this claim empirically, showing that FMs are outperformed consistently by simple linear baselines. We also illustrate a number of interesting pathologies, including instances where FMs suddenly output seemingly erratic, random-looking forecasts. Our results suggest a widespread failure of FMs to model cloud data.

Performance of Zero-Shot Time Series Foundation Models on Cloud Data

TL;DR

The paper evaluates zero-shot time-series foundation models on cloud data from Huawei Cloud, questioning their claimed cross-domain generalization. It conducts an empirical study across multiple FMs (e.g., VisionTS, Moirai, TimesFM, Chronos, TTM, Mamba4Cast) and compares them to online ridge regression and naive seasonal baselines using rolling-window forecasting and metrics like and . The results show all FMs underperform baselines on cloud data, with notable pathologies such as chaotic forecasts and failure to capture seasonality or spikes; VisionTS often behaves like a naive seasonal forecaster, contributing to the overall degradation. The findings challenge broad generalization claims for zero-shot FMs in cloud contexts and motivate development of conditioning or fine-tuning approaches to adapt FMs to time-series data characterized by spikes and strong periodicity, to make them practically useful for cloud forecasting.

Abstract

Time series foundation models (FMs) have emerged as a popular paradigm for zero-shot multi-domain forecasting. FMs are trained on numerous diverse datasets and claim to be effective forecasters across multiple different time series domains, including cloud data. In this work we investigate this claim, exploring the effectiveness of FMs on cloud data. We demonstrate that many well-known FMs fail to generate meaningful or accurate zero-shot forecasts in this setting. We support this claim empirically, showing that FMs are outperformed consistently by simple linear baselines. We also illustrate a number of interesting pathologies, including instances where FMs suddenly output seemingly erratic, random-looking forecasts. Our results suggest a widespread failure of FMs to model cloud data.

Paper Structure

This paper contains 16 sections, 2 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Demonstration of the pathological behaviour of zero-shot FMs on cloud data: The figure shows three consecutive forecasts for the Moirai FM on the Huawei Cloud D2 dataset. The forecasts shown are those produced when starting forecasting at the $t\text{-}1$, $t$, and $t\text{+}1$ time steps, where $t=7130$. The blue curves show the context Moirai is given to construct the forecast. The plot shows that with only a small change in the context, Moirai's forecasts can change from predicting reasonably (the $t\text{-}1$ and $t\text{+}1$th time steps) to giving inaccurate and chaotic forecasts (the $t$th time step).
  • Figure 2: FM forecasts at the same time-step ($\bm{t=6030}$) for the third channel of the Huawei Cloud D2 dataset. The plots shows the forecasts for TimesFM, TTM, Mamba4Cast and Chronos FMs. None of the FMs forecasts are accurate. TimesFM, Mamba4Cast and Chronos do not identify the seasonal pattern. While, TTM performs better than the rest but does not forecasts the periodic spike in demand, incurring a large inaccuracy at that point.
  • Figure 3: An example forecast for VisionTS, demonstrating that it is roughly equivalent to a naive seasonal forecaster for cloud data. Because the VisionTS forecast is roughly the same as the last seasonal period in the context, it is very similar to the naive seasonal forecast.
  • Figure 4: FM and baseline forecasts at the same time-step ($\bm{t=6030}$) for the third channel of the Huawei Cloud D2 dataset. These plots show the forecasts for FMs not shown in Figure \ref{['fig:FM_forecasts']}: Moirai and VisionTS. It also shows the forecasts for the two baseline models: the online linear and naive seasonal forecasters. As in Figure \ref{['fig:FM_forecasts']} the FMs do not perform as well as the baselines. Moirai gives a poor forecast and does not predict the spike. While, as discussed in Section\ref{['sec:why']}, VisionTS gives a similar---but dampened and so worse---forecast to the naive seasonal forecaster. This is in contrast to the two baseline methods which predict accurately, importantly forecasting the spike in the data.
  • Figure 5: FM forecasts at the same time-step ($\bm{t=6000}$) for the third channel of the Huawei Cloud D2 dataset. The figure shows an additional set of forecasts for the FMs and the baselines. These forecasts occur $30$ time steps before the ones shown in Figure \ref{['fig:FM_forecasts']}. The figure shows that the FMs suffer from the same failures as shown in Figure \ref{['fig:FM_forecasts']}, giving evidence that these problems occur frequently. None of the FMs perform as well as the simple baselines.