Realised Volatility Forecasting: Machine Learning via Financial Word Embedding
Eghbal Rahimikia, Stefan Zohren, Ser-Huang Poon
TL;DR
This work investigates whether news-driven embeddings can improve realised volatility forecasting and whether such signals are complementary to traditional HAR-family models. It compares specialised financial embeddings (FinText) against general-purpose embeddings across stock-related and general news, evaluating standalone NLP forecasts and ensembles with HAR benchmarks, plus explainability via SHAP. The findings show stock-related news offers meaningful predictive content, especially on high-volatility days, and that simple ensembles of NLP forecasts with HAR models deliver statistically and economically meaningful gains, with SHAP attributing forecasts to interpretable finance-relevant phrases. The results underscore the value of finance-focused word embeddings and offer a practical, transparent NLP framework for volatility forecasting with potential applications beyond RV.
Abstract
We examine whether news can improve realised volatility forecasting using a modern yet operationally simple NLP framework. News text is transformed into embedding-based representations, and forecasts are evaluated both as a standalone, news-only model and as a complement to standard realised volatility benchmarks. In out-of-sample tests on a cross-section of stocks, news contains useful predictive information, with stronger effects for stock-related content and during high volatility days. Combining the news-based signal with a leading benchmark yields consistent improvements in statistical performance and economically meaningful gains, while explainability analysis highlights the news themes most relevant for volatility.
