Table of Contents
Fetching ...

Time-Series Foundation AI Model for Value-at-Risk Forecasting

Anubha Goel, Puneet Pasricha, Juho Kanniainen

TL;DR

The paper investigates Value-at-Risk forecasting using Google's TimesFM time-series foundation model, comparing zero-shot and fine-tuned deployments against traditional econometric methods (GARCH, GAS) and non-parametric EDF. Using 19 years of SP100 data across 91 constituents, the authors train on 2005–2014 and test 2015–2023, evaluating VaR at $1 ext{\%}$, $2.5 ext{\%}$, $5 ext{\%}$, and $10 ext{\%}$ levels. The key finding is that fine-tuned TimesFM substantially improves Actual over Expected violations and achieves competitive or superior quantile-loss performance relative to benchmarks, with the strongest gains at the extreme tails and for short horizons. Zero-shot TimesFM underperforms, establishing the importance of domain-specific fine-tuning. The work demonstrates the potential of time-series foundation models in risk forecasting, while highlighting challenges in interpretability and regulatory acceptance and suggesting directions for future research in volatility forecasting and portfolio applications.

Abstract

This study is the first to analyze the performance of a time-series foundation AI model for Value-at-Risk (VaR), which essentially forecasts the left-tail quantiles of returns. Foundation models, pre-trained on diverse datasets, can be applied in a zero-shot setting with minimal data or further improved through finetuning. We compare Google's TimesFM model to conventional parametric and non-parametric models, including GARCH and Generalized Autoregressive Score (GAS), using 19 years of daily returns from the SP 100 index and its constituents. Backtesting with over 8.5 years of out-of-sample data shows that the fine-tuned foundation model consistently outperforms traditional methods in actual-over-expected ratios. For the quantile score loss function, it performs comparably to the best econometric model, GAS. Overall, the foundation model ranks as the best or among the top performers across the 0.01, 0.025, 0.05, and 0.1 quantile forecasting. Fine-tuning significantly improves accuracy, showing that zero-shot use is not optimal for VaR.

Time-Series Foundation AI Model for Value-at-Risk Forecasting

TL;DR

The paper investigates Value-at-Risk forecasting using Google's TimesFM time-series foundation model, comparing zero-shot and fine-tuned deployments against traditional econometric methods (GARCH, GAS) and non-parametric EDF. Using 19 years of SP100 data across 91 constituents, the authors train on 2005–2014 and test 2015–2023, evaluating VaR at , , , and levels. The key finding is that fine-tuned TimesFM substantially improves Actual over Expected violations and achieves competitive or superior quantile-loss performance relative to benchmarks, with the strongest gains at the extreme tails and for short horizons. Zero-shot TimesFM underperforms, establishing the importance of domain-specific fine-tuning. The work demonstrates the potential of time-series foundation models in risk forecasting, while highlighting challenges in interpretability and regulatory acceptance and suggesting directions for future research in volatility forecasting and portfolio applications.

Abstract

This study is the first to analyze the performance of a time-series foundation AI model for Value-at-Risk (VaR), which essentially forecasts the left-tail quantiles of returns. Foundation models, pre-trained on diverse datasets, can be applied in a zero-shot setting with minimal data or further improved through finetuning. We compare Google's TimesFM model to conventional parametric and non-parametric models, including GARCH and Generalized Autoregressive Score (GAS), using 19 years of daily returns from the SP 100 index and its constituents. Backtesting with over 8.5 years of out-of-sample data shows that the fine-tuned foundation model consistently outperforms traditional methods in actual-over-expected ratios. For the quantile score loss function, it performs comparably to the best econometric model, GAS. Overall, the foundation model ranks as the best or among the top performers across the 0.01, 0.025, 0.05, and 0.1 quantile forecasting. Fine-tuning significantly improves accuracy, showing that zero-shot use is not optimal for VaR.

Paper Structure

This paper contains 11 sections, 12 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: TimesFM structure Note. From das2024decoder. A Decoder-Only Foundation Model for Time-Series Forecasting. Proceedings of the Forty-First International Conference on Machine Learning (ICML). Copyright 2024 by the Authors. The figure presents the architecture of TimesFM, a time-series foundation model built on a decoder-only transformer framework.
  • Figure 2: The bar plots of number of assets (out of 92) for which we failed to reject the null hypothesis in the Unconditional Coverage (UC) test, at various value-at-risk (VaR) levels, on out-of-sample forecasts for the period January 2015 to September 2023. A higher number indicates better model performance, as it reflects a closer alignment between observed and expected VaR violations for each specified model. Results are reported for four VaR confidence levels (1%, 2.5%, 5%, and 10%) across different models.
  • Figure 3: The bar plot of the number of assets (out of 92) for which we failed to reject the null hypothesis in the Conditional Coverage (CC) test, at various value-at-risk (VaR) levels, on out-of-sample forecasts for the period January 2015 to September 2023. A higher number indicates better model performance, as it reflects a closer alignment between observed and expected VaR violations for each specified model. Results are reported for four VaR confidence levels (1%, 2.5%, 5%, and 10%) across different models.
  • Figure 4: The bar plot of the number of assets (out of 92) for which we failed to reject the null hypothesis in the Dynamic Quantile (DQ) test, at various value-at-risk (VaR) levels, on out-of-sample forecasts for the period January 2015 to September 2023. A higher number indicates better model performance, as it reflects a closer alignment between observed and expected VaR violations for each specified model. Results are reported for four VaR confidence levels (1%, 2.5%, 5%, and 10%) across different models.