Data-Efficient Realized Volatility Forecasting with Vision Transformers
Emi Soroka, Artem Arzyn
TL;DR
The paper tackles forecasting the next 30 days realized volatility from the implied volatility surface of options data under limited historical samples. It adapts Vision Transformer (ViT) architectures to treat the IV surface as a 1-channel image of size $10\times 36$, augmented with seasonal features, and trains a 4-layer MLP regressor to predict $R^2$-based performance, comparing deep and wide ViTs to an MLP baseline. Key findings show ViTs can learn nonlinear patterns from IV surfaces, with the best 1.7M-parameter ViT achieving approximately $R^2=0.41$ when trained on 2012–2021 data and tested on 2022, while smaller ViTs are data-efficient and larger ones require longer training; COVID-era data perturbations reduce performance. The results illustrate a data-efficient transformer approach for options-derived signals and point to transfer learning and ensembling as promising directions for future work in financial forecasting.
Abstract
Recent work in financial machine learning has shown the virtue of complexity: the phenomenon by which deep learning methods capable of learning highly nonlinear relationships outperform simpler approaches in financial forecasting. While transformer architectures like Informer have shown promise for financial time series forecasting, the application of transformer models for options data remains largely unexplored. We conduct preliminary studies towards the development of a transformer model for options data by training the Vision Transformer (ViT) architecture, typically used in modern image recognition and classification systems, to predict the realized volatility of an asset over the next 30 days from its implied volatility surface (augmented with date information) for a single day. We show that the ViT can learn seasonal patterns and nonlinear features from the IV surface, suggesting a promising direction for model development.
