Table of Contents
Fetching ...

Data-Efficient Realized Volatility Forecasting with Vision Transformers

Emi Soroka, Artem Arzyn

TL;DR

The paper tackles forecasting the next 30 days realized volatility from the implied volatility surface of options data under limited historical samples. It adapts Vision Transformer (ViT) architectures to treat the IV surface as a 1-channel image of size $10\times 36$, augmented with seasonal features, and trains a 4-layer MLP regressor to predict $R^2$-based performance, comparing deep and wide ViTs to an MLP baseline. Key findings show ViTs can learn nonlinear patterns from IV surfaces, with the best 1.7M-parameter ViT achieving approximately $R^2=0.41$ when trained on 2012–2021 data and tested on 2022, while smaller ViTs are data-efficient and larger ones require longer training; COVID-era data perturbations reduce performance. The results illustrate a data-efficient transformer approach for options-derived signals and point to transfer learning and ensembling as promising directions for future work in financial forecasting.

Abstract

Recent work in financial machine learning has shown the virtue of complexity: the phenomenon by which deep learning methods capable of learning highly nonlinear relationships outperform simpler approaches in financial forecasting. While transformer architectures like Informer have shown promise for financial time series forecasting, the application of transformer models for options data remains largely unexplored. We conduct preliminary studies towards the development of a transformer model for options data by training the Vision Transformer (ViT) architecture, typically used in modern image recognition and classification systems, to predict the realized volatility of an asset over the next 30 days from its implied volatility surface (augmented with date information) for a single day. We show that the ViT can learn seasonal patterns and nonlinear features from the IV surface, suggesting a promising direction for model development.

Data-Efficient Realized Volatility Forecasting with Vision Transformers

TL;DR

The paper tackles forecasting the next 30 days realized volatility from the implied volatility surface of options data under limited historical samples. It adapts Vision Transformer (ViT) architectures to treat the IV surface as a 1-channel image of size , augmented with seasonal features, and trains a 4-layer MLP regressor to predict -based performance, comparing deep and wide ViTs to an MLP baseline. Key findings show ViTs can learn nonlinear patterns from IV surfaces, with the best 1.7M-parameter ViT achieving approximately when trained on 2012–2021 data and tested on 2022, while smaller ViTs are data-efficient and larger ones require longer training; COVID-era data perturbations reduce performance. The results illustrate a data-efficient transformer approach for options-derived signals and point to transfer learning and ensembling as promising directions for future work in financial forecasting.

Abstract

Recent work in financial machine learning has shown the virtue of complexity: the phenomenon by which deep learning methods capable of learning highly nonlinear relationships outperform simpler approaches in financial forecasting. While transformer architectures like Informer have shown promise for financial time series forecasting, the application of transformer models for options data remains largely unexplored. We conduct preliminary studies towards the development of a transformer model for options data by training the Vision Transformer (ViT) architecture, typically used in modern image recognition and classification systems, to predict the realized volatility of an asset over the next 30 days from its implied volatility surface (augmented with date information) for a single day. We show that the ViT can learn seasonal patterns and nonlinear features from the IV surface, suggesting a promising direction for model development.

Paper Structure

This paper contains 16 sections, 1 equation, 7 figures, 4 tables.

Figures (7)

  • Figure 1: IV surface for NVDA stock on 2021-04-13, presented as a one-channel image instead of the traditional three-dimensional surface. Negative deltas correspond to puts.
  • Figure 2: Vision transformer architecture (left) on our data, with more detailed schematic of the standard transformer architecture used in our model (right). In the deep Vision Transformer, the MLP layer in the Transformer model is repeated.
  • Figure 3: Effect of dataset size on model $R^2$. Where multiple train-test splits are possible, average $R^2$ is reported.
  • Figure 4: Training on one year (left) or four (right) and predicting the 30-day realized volatility on the next year, for data between 2012-2022. The performance drop for the test year 2020 reflects market disruption during the COVID pandemic. In practice one could iteratively retrain the models; note that the small models recover their performance on the 2021 test sample.
  • Figure 5: Number of samples per month of data.
  • ...and 2 more figures