Time-Series Foundation AI Model for Value-at-Risk Forecasting
Anubha Goel, Puneet Pasricha, Juho Kanniainen
TL;DR
The paper investigates Value-at-Risk forecasting using Google's TimesFM time-series foundation model, comparing zero-shot and fine-tuned deployments against traditional econometric methods (GARCH, GAS) and non-parametric EDF. Using 19 years of SP100 data across 91 constituents, the authors train on 2005–2014 and test 2015–2023, evaluating VaR at $1 ext{\%}$, $2.5 ext{\%}$, $5 ext{\%}$, and $10 ext{\%}$ levels. The key finding is that fine-tuned TimesFM substantially improves Actual over Expected violations and achieves competitive or superior quantile-loss performance relative to benchmarks, with the strongest gains at the extreme tails and for short horizons. Zero-shot TimesFM underperforms, establishing the importance of domain-specific fine-tuning. The work demonstrates the potential of time-series foundation models in risk forecasting, while highlighting challenges in interpretability and regulatory acceptance and suggesting directions for future research in volatility forecasting and portfolio applications.
Abstract
This study is the first to analyze the performance of a time-series foundation AI model for Value-at-Risk (VaR), which essentially forecasts the left-tail quantiles of returns. Foundation models, pre-trained on diverse datasets, can be applied in a zero-shot setting with minimal data or further improved through finetuning. We compare Google's TimesFM model to conventional parametric and non-parametric models, including GARCH and Generalized Autoregressive Score (GAS), using 19 years of daily returns from the SP 100 index and its constituents. Backtesting with over 8.5 years of out-of-sample data shows that the fine-tuned foundation model consistently outperforms traditional methods in actual-over-expected ratios. For the quantile score loss function, it performs comparably to the best econometric model, GAS. Overall, the foundation model ranks as the best or among the top performers across the 0.01, 0.025, 0.05, and 0.1 quantile forecasting. Fine-tuning significantly improves accuracy, showing that zero-shot use is not optimal for VaR.
